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ABSTRACT 


The dynamic allocation of limited processor and main memory 
resources among the members of a user community is investigated 
as a supply-and-demand problem. The work is divided into four 
phases. 

The first phase is the construction of the working set model 
for program behavior. This model is based on locality, the con- 
cept that, during any interval of execution, a program favors a 
subset of its information; a computation’s working set is a dyn- 
_amic measure of this set of favored information. A working set 
storage management policy is one that allocates processors to 
a computation if and only if there is enough uncommitted space 
in main memory to contain its working set. Under such a policy, 
a computation acquires and releases storage as needed, indepen- 
dently of other computations; because computations are thus made 
statistically independent, it is possible to derive many detailed 
properties of such policies, both in shared and unshared situations. 

The second phase is to define and study the properties of 
system demand. A computation is regarded as the basic demand- 
making entity, placing demands jointly on processor and main mem- 
ory resources. Its system demand is a pair (processor demand, 
memory demand), where its processor demand represents its immedi- 
ate processor requirement (intensity and duration), and its mem- 
ory démand represents its immediate main memory requirement (its 
working set size). 

The third phase is to define and study the properties of 
system balance. Computations that demand resources are segre- 
gated into two classes: the first class, called the standby set, 
is temporarily denied the use of system resources; the second 
class, called the balance set, is granted the use of system re- 
sources. The system is balanced when the total system demand 
of the balance set matches the system capacity. A balance policy 
is a resource allocation policy that regulates membership in the 
balance set so that balance is maintained. Balance policies are 
formulated as: mathematical programming problems whose solutions 
are found dynamically by the scheduler. 

The fourth phase is to apply all these ideas to the design 

and administration of multiprocess computer systems. A relation 
describing the equipment configuration is derived; suggestions 
for processor and multilevel memory system design are made. Per- 
formance measures are discussed. 

This work is intended to be a new approach to modelling the 
behavior of ongoing computations. It is intended to be a general, 
unified philosophy about allocation and sharing. It is intended 
to spark new thinking about the design and administration of 
multiprocess computer systems. 
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NOTATION 


To prevent confusion, we list here the major notational 
conventions we have used in this thesis. 


Notation Explanation 

xX EA x is an element of the set A. 

A= {x|P} A comprises all elements x 
having the property P. 

ja] the number of elements in A. 

ACB the set A is contained in the 


set B; i.e., every element of 
A is also an element of B. 


B= Ua, = {x|xea,, some ie} definition of the union of sets. 
ieélL 
B= ‘aes = {x|xea, , all ier} definition of the intersection 
: of sets. 
iél 
Pr[A] probability of the event A. 
pr[A |B] probability of the event A, 
conditioned on the occurrence 
of the event B. 
PY(u) = Pr[x<u] probability distribution func- 
tion for the random variable x. 
£,,(u) = Sor Cu) probability density function 
for the random variable x. 
x = fu fi,(u) du the mean, or expectation, of 
x : 
the random variable x. 
ge 22 f ue £,(u) du the second moment of the random 
variable x. 
2 2 2 
oa =x - x the variance of the random 
variable x. 
g(x) = fglu) £,(u) du expectation of the function 
g(x) of the random variable x. 
g(x,y) = f ada,y) f(a) du expectation with respect to x 


of the function g(x,y) of two 
random variables x and y. 
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CHAPTER 1 


The Resource Allocation Problem 


1.0. Introduction to the Resource Allocation Problem 

The desire for a general purpose community computing fac- 
ility -- a computer utility -- has motivated recent trends in 
computer design. Just as electric power is distributed to the 
members of a community to satisfy their electromechanical needs, 
so-called computing power can be distributed to the members of 
a community to satisfy their information processing needs. 

The essence of a computer utility can be captured in one 
word: sharing. By sharing computing resources, the users dis- 
tribute the costs, and each pays less. By sharing information, 
one user may build on the work of others, and advance more ra- 
pidly in his own work. Sharing benefits the system, too, for 
the system may select from a wide range of instantaneous demands 
those that tend to improve its efficiency. Resource allocation 
is the problem of distributing limited resources among members 
of the community. 

In recent years we all have watched the evolution of soph- 
isticated techniques for sharing of equipment and information, 


techniques such as multiprogramming, multiprocessing, multi- 


accessing [C7,C8,D7,P2], segmentation and paging [D8], and 

traffic control [52]. Computer systems using these techniques 

have not always met expectations. For example, it has been ob- 
served that the efficiency of paged memory systems has often 

been much less than anticipated. There have even been instances 

of unexpected behavior. For example, it has been observed that 

it is possible to be processing a set of programs using all the 
available memory and processor resources; introducing an additional, 
average-sized program into memory can trigger a total collapse 

of service efficiency, leaving almost all the processors idle. 

This phenomenon, known as thrashing, at first defies our intuition, 
which would instead lead us to expect gradual degradation of ser- 
vice as additional programs are squeezed into memory. 

What causes thrashing? When multiprogramming a memory, what 
is the smallest subset of each program that ought to reside there? 
What (perhaps unwanted) interactions take place among programs 
that compete for the same equipment? Given a set of programs, 
what should be the configuration of processor and memory resources 
to serve them best? What is the best scheduling policy? The 
best storage management policy? How does one predict the resource 
requirements of a program when nothing is known about it before- 
hand? How can one tell if the system is behaving properly? 

The lack of answers to some of these questions, the intense 
debate over others, and the existence of yet unasked questions, 
lead inescapably to this simple conclusion: we do not understand 
the behavior of ongoing computations. 

Thus, multiprogramming, multiprocessing, and all the other 


techniques, are not solutions to the resource allocation problem; 


they are but tools by which a solution may be implemented. 


It is the purpose of this thesis to start filling the gap, 
to develop new approaches to modelling the behavior of comput- 
ations, to spark a new way of thinking about programs in exe- 
cution, to evolve a general, unified philosophy about resource 
sharing and allocation. 

I felt that an interesting and useful solution to the re- 
source allocation problem should be based on the ideas of supply- 
and-demand economics in a free-enterprise market; and this think- 
ing underlies my work. I wanted to formulate resource alloc- 
ation as the problem of selecting fairly from all user demands 
a subset whose total demand balances the supply; I wanted the 
solution to be applicable across a wide range of computer sys- 
tems, large and small, existing and proposed, from Multics [c8], 
to Dijkstra’s harmonious society of cooperating sequential pro- 
cesses [Dll], to the highly parallel machines of Dennis [D10] 
and Slotnick [S6]; I wanted the solution to be unified in the 
sense that processor and memory allocation are handled together, 
not in two separate decisions. To accomplish these goals, I 
approached the problem in four phases. 

The first phase was the construction of an abstract model 
for program behavior. This model, the working set model, makes 
it possible to decide which information is in use by a single 
computation or set of computations; intuitively, a computation’s 
working set of information is the smallest collection of infor- 
mation that must be present in main memory for it to operate 
efficiently. The working set model is based on the concept of 
locality, the idea that a computation will, during an interval 
of time, favor a subset of the information available to it; 


the working set is a dynamic measure of this set of favored 


information. A working set memory management policy is one 


that guarantees a computation shall receive the use of proces- 
sors if and only if its working set is present in main memory. 
Under such a policy, computations are made independent, the 
memory acquisitions of one computation being unaffected by those 
of another; thus, unwanted interactions among computations aris- 
ing from competition for memory and processor resources may be 
eliminated. Under such a policy a computation acquires more 
or less memory in accordance with its needs. 

The second phase was to define demand. Observing that a 
computation jointly demands the use of processor and memory re- 


sources, we defined a computation’s system demand to be a pair 
(processor demand, memory demand) 


where a processor demand represents the computation’s immediate 
processor requirement (intensity and duration), and a memory demand 
demand represents the computation’s main memory requirement 
(its working set size). 

The third phase was to investigate system balance. We will say 
that the system is balanced when the sum total of the demands 
of active computations matches the available equipment. This 
set of active computations will be called the balance set. A 
balance policy is a resource allocation policy that regulates 
membership in the balance set so that the balance set, regarded 
aS a super-computation, has known characteristics; its total 
demand is maintained within close tolerance of whatever is re- 
quired to match the equipment. We have been able to formulate 
the problem of deciding which computations are to be members 


of the balance set as a mathematical programming problem, whose 


solution is found dynamically by the scheduler. 


The fourth phase was to apply all these ideas to the design 
and administration of computer systems. A particularly important 
result is: the proper ratio of processor to memory (that is, the 
equipment configuration) that achieves some efficiency level is 
determined not only by the statistics of program size and dur- 
ation, but also by the access time of auxiliary storage devices. 

We are also able to make suggestions about processor design, multi- 


level memory system design, and performance measurements. 


1.1. Plan of the Thesis 

The thesis is organized into four parts. Chapters 1 and 2 
review the concepts with which we want the reader to be familiar; 
Chapters 3, 4, and 5 deal with the working set model; Chapters 6 
and 7 investigate demands, balance, and balance policies; and 
Chapters 8 and 9 look into implications the models have on sys- 
tem design and administration. 

The remainder of the discussion here in Chapter 1 falls into 
two categories: constraints and economics. The most important 
constraint within which we assume a solution to the resource al- 
location problem must function, programming generality, is the 
independence of an algorithm description from the environment in 
which it operates. One of the consequences of this constraint 
is that the computer system must predict, without outside 
assistance, the demands of the computations it executes. A dis- 
cussion of basic supply-and-demand economic theory is included 
to illustrate how pricing policies can be used to regulate demands. 
Chapter 2 reviews basic multiprocess computer system concepts. 

In Chapter 3 we define the working set model for program 
behavior and show that a working set memory management policy 
is the optimum of all policies that must operate without know- 
ledge of future reference patterns made by computations. In 
Chapter 4 the working set model is refined and a great many of 
its properties are derived. .The discussion of Chapters 3 and 4 
is restricted to the case in which no information is shared; ac- 
cordingly we examine in Chapter 5 the effects of sharing. We 
show how dramatically sharing can improve efficiency and reduce 


the resource usage costs attributed to a particular user. 


In Chapter 6 we present the formal definitions of demand 
and balance and discuss basic aspects of balance policies. Chap- 
ter 7 is devoted to formulating balance policies as mathematical 
programming problems. Such formulations have dual advantage: first, 
we need not find explicit solutions for the balance policies as 
long aS we can convince ourselves that the scheduler is dyn- 
amically finding them; and second, we are assured that the policies 
are optimum since the objective functions are clearly stated. 

Chapter 8 deals with applications to computer system design. 
We derive a relation specifying processor-memory configuration, 
we show that pooling of hardware at a fine level of detail can 
achieve the effect of a large number of processors with a small 
amount of hardware, and we discuss organization and management 
of multilevel memory systems in light of generalized working set 
models. 

Chapter 9 deals with performance measures. Given the models 
and formulation of the solution to the resource allocation prob- 
lem, the performance measures are determined, so Chapter 9 merely 
collects together the major measured discussed in earlier chapters. 

The reader who merely wants to get a detailed overview of 
the major work of this thesis, without having to dig through the 
detailed properties of our models, need only read Chapters 1, 


3, 6, and 8, for that is where the main thread lies. 


1.2..The Problem and Its Constraints 

We have formulated the problem in the context of a multi- 
process computer system; we presume that the reader is already 
familiar with mulitprocess computer system objectives, the par- 
ticular details of which may be found in references [C8,F1,P2,V2]. 
Specific implementation concepts will be reviewed in Chapter 2. 
The properties that constrain and complicate the solution to the 
resource allocation problem are discussed below. 

The specific problem toward which this thesis work has been 
directed is: 

To formulate behavior models of computations in multiprocess 

computer systems; then, using the models, formulate a unified 

approach to dynamic allocation of processor-memory resources 

among computations, balancing supply against demand under 

appropriate criteria of fairness. 
We have omitted discussion of input-output allocation for three 
reasons. First, we assume the time rate at which a user inter- 
acts with his computation is relatively very much slower than 
the time rate at which execution proceeds; we are interested pri- 
marily in dynamic resource allocation in the intervals between 
interactions. Second, we feel that the models are general enough 
so that generalization to resource types beyond processor and 
memory will be straightforward. Third, we feel that all the 
rich complexity of the resource allocation problem can be found 
entirely in the processor-memory problem. 

We assume the existence of two kinds of constraints: limited 
equipment and programming generality. 

The limited equipment constraints center on the existence 


of only a fixed, finite amount of processor and memory resources. 


There are N identical processors, each of which can deliver 


information references at the rate of one per unit time; since 
the processing rate is bounded, the duration of a program’s 
execution enters the problem. We assume the standard unit of 
information storage and transmission is a page, and that the 
capacity of directly-addressable, main memory is M pages. 
Whenever we talk of the equipment, or the resources, we spec-— 
ifically mean the N processors and the M pages of main memory. 
The remaining constraints center on the issue of program- 
ming generality [D10], which is the independence of an algor- 
ithm description from the environment in which it operates. 


Programming generality includes 


1. The ability to move a program between installations, 
either manually or automatically (e.g., via computer 
networks). 

2. The ability to use a program, without changes, des- 
pite changes to the hardware or to the hardware con- 
figuration. 

3. The ability to use one program in the construction 
of another -~ to build on the work of others, and to 


share information dynamically. 


This third aspect implies that programs will be modular in con- 
struction (i.e., programs will be segmented [D8]). Once compiled, 
a program module should be usable without recompilation as a build- 
ing block of any program whatever. To exhibit programming 


generality, the computer system must permit a program module to: 


3. 
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Create data structures of arbitrary size unknown prior 
to execution. 

Call on further procedures unknown to the caller (which 
may call on still other procedures, etc.) 


Transmit arbitrarily complex data structures as arguments. 


These last three points, centering on data dependence, imply that 


a module’s resource requirements not only will be unknown prior 


to its execution but also will be indeterminable. Thus, the pro- 


gramming generality requirement places these constraints on the 


resource allocation problem: 


1. 


The computer system, not the programmer or the compiler, 
must decide for itself where in the memory hierarchy 
information is to reside [D4,D10]. 

Algorithms must be configuration independent. Infor- 
mation references must be made by means of a location- 
independent addressing mechansim. 

Information flows upward in the memory hierarchy only 
on demand, being moved into main memory only when it is 
referenced by a computation. Information flows down- 
ward in the memory hierarchy as it falls out of use. 
Arbitrary collections of programs will demand to share 
arbitrary sets of data. Many programs will reside sim- 
ultaneously in main memory (multiprogramming) and many 


processes will be active concurrently (multiprocessing). 


In order to be consistent with programming generality, we 


have assumed the no-advance-information constraint, namely that 


programmers and compilers will, because of data dependence, be 
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unable to make reliable advance estimates about the resource needs 
of their own programs’. In addition, any advice that is obtained 
from a programmer cannot necessarily be regarded as useful advice 
even if it may be reliable: a user would intend to optimize the 
environment for his own program -- configuring resources to suit 
an individual may interfere with overall good service to the com- 
munity. In order to guard against dishonest users who attempt 

to secure better service by misrepresenting their needs, the sys- 
tem must monitor program behavior, and impose penalties for bad 
estimates. The additional overhead to do this may not be worth 
the cost. 

Since it is not at all clear that advice obtained from pro- 
grammers or compilers can be of any real value, we have chosen 
to formulate a solution to the resource allocation problem in 
the case where there is no advice, where the computer system 
must discover for itself how programs behave. Clearly there will 
be situations in which advice oan be useful, but these are not 
of interest to us here. 

In the interest of programming generality we make the fol- 
lowing distinction between the tools and the methods of resource 
allocation: 

1. The mechanisms, or machinery, of assigning and releasing 
equipment must operate on a low level in that they deal 
directly with the hardware features of the system. Some 
of these tools include multiprogramming, multiprocessing, 


segmentation, paging, interprocess communication, etc. 


Imere have been attempts to do this. Ramamoorthy [R1] for example, 
has a proposal for automatic segmentation of programs during 
compilation. 
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2. The policies of resource allocation operate on a higher 
level, in that criteria used to determine when equipment 
is to be allocated to a computation can be machine- 
independent. They are machine-independent inasmuch as 
no detailed knowledge of machine organization is nec- 
essary or even relevant. 

Such a functional separation permits changing policies without 
changing machinery, if the machinery is properly defined. 

Since many of the mechanical aspects of resourde sharing 
already have substantial solutions, we begin investigation at the 
machine-independent level. Resource allocation policies may be 
grouped into classes: 

l. Short-term policies, which must be handled by the com- 
puter system, since decisions must be made in a time 
scale far faster than human response. 

2. Long-term licies, primarily economic, which control 
demands over long periods of time. 

In our work here, short-term policies are concerned with matching 
the demand to the supply, long-term policies with matching the 
supply to the demand. 

The bulk of the thesis is concerned with models that show 
how to define the short-term, balance policies. After a detailed 
discussion in the next section of why balance was chosen as a 
resource allocation goal, we turn attention to a discussion of 


the long-term, economic policies. 
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1.3. Why Balance? 


There were good reasons to choose balance as the objective 
of resource allocation policies, rather than other criteria such 
as maximum equipment utilization or minimum response time. 

The most important reason, already stated, is our desire 
to be consistent with the ideas of supply-and-demand economics. 

The remaining reasons are the results of this thesis. We 
state them here, although many of their justifications will not 
come completely to light until Chapter 7. 

First, it is conceptually simple and mathematically tract- 
able, and it insures a reasonable policy with respect to criteria 
such as maximum equipment utilization or minimum response time. 

Second, we will show that its relative simplicity not only 
makes performance testing and evaluation straightforward but also 
makes clear which parameters are important. Moreover, its rel- 
ative simplicity makes implementation easy. 

Third, we will show that balance exercises control over the 
factors that cause thrashing; recall that thrashing denotes the 
sudden collapse of service efficiency that may occur when too 
many programs are squeezed into main memory. 

Fourth, balance compromises between the conflicting objec- 
tives of fast fair service and low equipment idleness. We il- 
lustrate the dichotomy. Figure l-la shows a server, before which 
is a queue of demands for its use, the average number in the sys- 
tem being n; we regard n as a measure of the demand for use of the 
server. When there are no demands in the system, the server is 
idle, an event occurring with probability Py: The average wait 


in the system is w. Figure 1~lb shows how Py varies with n. 
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Figure l-i. "racecoff betweon waiting time and idleness. 
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Figure l-lc illustrates that under a fair service discipline 
(i.e., one in which waiting time depends only on order of arrival) 
the expected wait varies linearly with n (see reference [S1], 

p. 42). Figure 1-ld, showing directly the relation between w 
(service) and Po (idleness), is constructed by choosing various 

x and finding the corresponding Wye In general, as Po decreases, 
w increases: there is an inverse relation between fast fair ser- 
vice and low equipment idleness. As we will see, balance exer- 
cises control over this relation. 

Fifth, we will show that a balance policy can be implemented 
in a relatively load-independent way, the amount of work needed 
to maintain balance depending on the distance of an actual load 
point to a desired load point. 

Sixth, the abstract model of a balanced computer system will 
show the relation between equipment configuration, the auxiliary 


memory access time, and balance. 
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1.4. Supply-and-Demand Economic Principles 


This section surveys aspects of the economic structure under- 
lying our thinking. We consider here a form of supply~-and-demand 
computer system economics. 

One motivation for a multiprocess computer system is econo- 
mic: there is a community of users, who individually would be 
unable to afford the full services of a computer system, but who 
collectively can pay the costs. This goal -=- cheap computing -- 
is not attainable solely within a mutliprocess computer system. 
The ability of one user to share and build on the work of others 
is a far more compelling motivation. Yet sharing complicates, 
among other things, the problem of charging users for resource 
consumption, because now the cost of a shared resource must be 
attributed to the participants in accordance with their degrees 
of participation. 

The perhaps overworked term computer utility can be misleading, 
for it is not entirely analogous to the public utilities as we 
know them. Contemporary public utilities are rather large eco- 
nomic systems where the average demand is known to vary slowly; 
for the immediate future, computer utilities will be rather small 
economic systems, subject to fast-changing demand. Public util- 
ities are relatively much larger than computing systems; in a 
computer utility, any user can easily demand every resource. 
Public utilities have physical limits on the quantity of service 
a customer can obtain (a 150 ampere main circuit breaker in his 
house, a 3-inch water main, or 2 telephones), and this need not 
be the case for computer utilities. 

From now on we shall refer to the management personnel of 


the computer system as the administration. It is the responsibility 
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of the administration to properly manage the system, deciding 

who is to use it, how prices are to be set, what additional equip- 
ment is to be purchased, what services are to be offered. Never- 
theless, the burden of managing the system lies mostly on the 
system itself; for example, it must provide automatic metering 

of resource usage and maintain data on demands. The models we 

set up will make it clear what should be metered and how, and 


what demand distributions should be determined and how. 


1.4.1. Demand Curves 
The administration can exercise economic controls over the 
demands of the user community by means of the prices it charges. 


Figure 1-2 shows an elementary demand curve, typifying the 


relation between price per unit resource and the total demand 
from the community. We observe that the higher the price per 
unit resource the less is the total community demand. Point A 
is the intersection between the amount R of resource currently 
provided by the system and the demand curve. If the price is 
less than Pa the user community demand will exceed the supply R. 
If the administration wishes to hold some resource in reserve, 
leaving only a fraction a of the R available, it must raise the 
price to Pp: We do not wish to consider issues such as how to 
set price to maximize profit, what to do if the demand curve is 
time varying, how long it is until a price change is felt, or 


whether instability will result from feedback between demand 


total 
community 
demand 
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Figure 
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and price. The point is: price is a lever for Semewe mG the 
total community demand. 

The demand curve of Figure 1-2 represents the behavior of 
some economic community in statistical steady state; therefore 
no claim can be made that at any particular time the demand curve 
is reliable. This further motivates the idea of dynamic balance: 
at any time the demand is closely regulated, known to be within 
close tolerance of the desired level. 

An essential component of a supply-and-demand pricing policy 
is the ability for a user to bid. Should a user desire improved 
service (at correspondingly higher prices) he may outbid his fel- 
lows. Should a user be unconcerned with the quality of service, 
he may underbid, obtaining poorer service at reduced price. By 
assuming the existence of a bidding mechanism, we may ignore cer- 
tain delicate questions surrounding the issue of user dissatis- 
faction; that is, we will not attempt to model dissatisfaction, 
hoping that unhappy users will raise their bids, or leave. We 
shall discuss details of bidding mechanisms in Section 1.4.3. 

Such an atmosphere of free enterprise, incorporating supply- 
and-demand resource allocation and competitive bidding for pri- 
ority, can quite possibly wreak havoc with computer system econo- 
mics, there being a serious threat of inflation. In terms of 
Figure 1-2, bidding gradually forces the demand curve up and to 
the right. There are two extremes of thought concerning the ad- 
ministration’s posture toward inflation: 

1. Do nothing. Just as other public utilities do, meter 

resource usage, but allow users as much as they need. 
This means that the administration must be willing to 


expand the system, adding new equipment so long as 
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someone is willing to pay for it. It also means that 
the administration must be able to detect trends in the 
community demand so that it can decide far enough in 
advance to order new equipment. 

2. ight controls. The administration should exercise con- 
trol over the total demand by allocating resource quotas 
to users, and by limiting the total number of users. 
This in some ways resembles the policies of parking lot 
officials, who allocate 150 stickers to fill 100 spaces, 
on the grounds that (on the average) only 100 cars will 
show up. The quotas allocated will depend on careful 
interpretation of demand statistics, and should be set 
so that the number of users trying to use the system 
at one time will present a total demand only slightly 
larger than system capacity. 

By itself, the first alternative is not workable because there 

is a physical limit to how much a particular installation may be 
expanded, and users seem always to manage to find problems that 
consume the capacity of the system, no matter how large it is. 

By itself, the second alternative is not workable because it im- 
plies gradual degradation of service, since the system cannot 
meet the needs of the existing community. A truly flexible pos- 
ture is a compromise between the two extremes: the administration 
must be prepared both to enforce controls and to expand capacity. 
[But who is to insure that the administration indeed takes such 
steps, when it is the one who profits by the inflation?] 

In order to implement the compromise, the administration 


must monitor performance and detect overload. Overload may be 


defined as follows. First set tolerance limits on service, such 
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as maximum allowable response time, or minimum allowable service 
rate (that is, the fraction actually received of the resource 
demanded): Overload exists when the. probability that service is 
not within the set limits exceeds some specified number; this 
probability is measured as the fraction of time service is poor. 

Even if the administration might want to decide against a 
quota system, the users may still desire some such system, for 
self-protection. By having a self-imposed quota, a user can pro- 
tect his pocketbook from a beserk computation. Should one of his 
programs run amuck, a quota would be exceeded and execution in- 
terrupted, the user being asked to decide whether to continue. 
Moreover, there should be some means whereby a user controls dis- 
tribution, among his own computations, of whatever resources have 
been allocated to him. This is particularly useful if the user 
supervises some project and desires to control spending by sub- 
ordinates. 

What is to be done when total demand temporarily exceeds 
capacity? Should all jobs be given equally poor service? Or, 
should jobs be divided into two classes, one to receive good ser- 
vice, the other to receive no service at all? As we shall see, 
the first alternative results in a high rate of resource multi- 
plexing and can easily cause thrashing; the second alternative 
may result in some jobs receiving no service. Balance can be 
used as a compromise: the balance set is that subset that receives 


all the service, but the membership in it is constantly changing. 
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1.4.2. Priorities 

In general, the higher a job’s priority, the better the ser- 
vice it obtains. Basically, there are three classes of priority 
used in today’s computer systems [C3]: 

1. Bought, by paying extra for better service. An ex- 
ample is the bidding mechanism discussed in the next 
section. 

2. Acquired, by displaying favorable or unfavorable char- 
acteristics during execution. An example is the CTSS 
multilevel queue [C6, $3] in which long jobs receive 
little attention. 

3. Deserved, by displaying favorable characteristics in 
advance of execution. An example, again, is CTSS 
[C6, $3] which gives jobs of small memory requirement 
better treatment than those of large memory requirement. 

A given computer system may employ a combination of these three 
types of priority. 

In our work here we shall consider only bought priority, 

and ignore acquired and deserved priorities. We ignore acquired 
priority because we deal with only completely fair resource al- 
location policies. We ignore deserved priority because we assume 


there is to be no advance allocation information. 
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1.4.3. Bidding 
We shall adopt the point of view that the bidding mechanism 


is a method by which a user purchases priority from the computer 
system (Kleinrock ues the more colorful term bribing [K2]). The 
cost of priority will be added to a user’s resource-consumption 
costs. If a user buys higher than average priority, the cost is 
positive (his bill is increased); if he buys lower than average 
priority, the cost is negative (his bill is reduced). By adop- 
ting this view, we insure that inflation due to bidding is on 
the cost of the priority and not on the cost of the resources 
themselves. 

Let (Py 1P>) be an interval of the real line; any point p 
in (py »Po) is a possible priority. If a user takes no action 
to obtain priority, he is assigned some standard priority Po* 
Otherwise he selects some priority p from (py 1Po?- There is 


a cost-of-priority function G(p,t) satisfying 


G(p,t) > 0 if pe (po»P>? at time t 
(1.4.1) G(p,t) = 0 if p= p, at time t 
G(p,t) <0 if pe (py 5P,? at time t 


Let c, (T) represent the resource=consumption cost for user k 
in the real time interval I; then user k would be billed 
(1.4.2) Cc, (T) + f Glp,t) dt 
ne 
There is clearly an incentive for a user to underbid his fel- 
lows, and a restraint against his overbidding. 
Let Fpoeee ody be the priorities of each of the n users at 


a certain time, and define the average priority to be 


(1.4.3) q = 
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An interesting example of a cost function is 


h(p) 


(1.4.4) Glp,t) = Glp) = cy A log,h(p) » hp) = 


aio 


for suitable constants Cy and A. The reader can verify that G(p) 
satisfies properties (1.4.1). Since G(p) increases exponentially 
with the deviation from the mean q, it is possible to penalize 

a user severely for large deviations. This discourages those 
who would outbid the entire community and pre-empt all service 
for themselves. Observe, however, that if everyone bids high, 

q increases and the relative cost of a high bid is less. Thus 
inflation can be a serious problem (but note: the inflation is 
on the cost of priority, not on the cost of resources, and is 

not as serious as inflation of the resource costs themselves). 
The administration can control inflation by replacing q with Py 
in eq. 1.4.4, and making Py smaller than the existing q. 

We do not want the purchased priority to modify the demand 
of a job, for a simple reason. Should priority be allowed to 
modify a job’s demand, operation of a balance policy would col- 
lapse: the scheduler would fail to keep the balance set demand 
at the desired level because the demands of its members were not 
accurately reported. 

The position of a job within its queue depends on the part- 
icular interpretation of the priority p it possesses. The two 
possibilities are: 

1. Fixed priority. An incoming job of priority p is placed 

ahead oF any job with priority less than p, but behind 


any job with priority greater than or equal to p. 
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2. Percentile priority. If the priority range (py »Po) is 
taken to be (0,1), then p may be interpreted as a per- 
centile. That is, the user wishes to be always ahead 
of 100p per cent of the jobs. An incoming job of pri- 
ority p, arriving to a queue of length n, is placed a 
distance (l-p)n from the front of the queue. 

The fixed priority interpretation will in general mean that a 
user experiences different degrees of improved (or degraded) ser- 
vice, depending on the instantaneous demand of his job. For 
example, suppose his job oscillates between two demand classes, 
designated A and B, and there is a separate queue for each class. 
Let Py denote the largest priority of a class A job, Pp the smal- 
lest priority of a class B job, and Pa<Pp: Suppose the user hap- 
pens to choose his priority to be p such that Pa <P<Ppe When in 
class A, he receives the best of service; when in class B, the 
worst, The percentile method circumvents this difficulty, always 
giving the user the same improvement (or retardation) relative 
to other users. 

We conclude by noting an interesting way to implement bidding. 
Each console is provided with a potentiometer, calibrated on the - 
range (Py »Po) the user may continuously adjust his priority. 
This can be enhanced by supplying a meter, also calibrated on 
the range (Py »Po)> which indicates the current average priority 
q across all users, and the particular user can adjust his own 
priority with respect to the average. The existence of such a 
meter constitutes instantaneous feedback between an economic sys- 
tem and the competitors: some very interesting inflation and de- 
flation effects could occur, perhaps even resulting in conditions 


very similar to those in the stock market in 1929. 
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CHAPTER 2 


The Environment 


2.0. Introduction 

The environment, consisting of the hardware and the software 
operating system plays an important role in the resource alloc- 
ation problem. The reader should already be familiar with the 
concepts of virtual computer, of segmentation and paging [D8], 
of program and address-mechanism structure [Al], of a process 
and parallel processes [D9], and of virtual time. We shall re- 
view these concepts here in order to establish the complete pic- 


ture (as we see it) of a multiprocess computer system. 
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2.1, The Basic System 


For ease of understanding both operation and design, it is 
usual to view the processing function and memory function separ- 
ately in a computer system. The processing function performs 
transformations on information stored by the memory function. 

The processing function is usually implemented by one or more 
processors, and the memory function by one or more memory modules. 

To satisfy the system objectives requiring expandability, 
reliability, and continuous availability, modular hardware con- 
struction is common: the processing function becomes a pool of 
identical processors with free and unrestricted access to a pool 
of identical memory modules. Removing (adding) a device from 
(to) a pool reduces (increases) the capacity of the pool. Within 
a pool each device is anonymous, there being no a priori assign~- 
ment of any particular task to any particular device. 

The high cost of directly-addressable memory forces memory 
systems to consist of at least two levels: 

1. main memory. No information can be processed unless it 

is present in main memory. Main memory is usually a 
magnetic core memory, though it could just as well be 
any other directly-addressable storage device, such as 
a thin-film memory. Other terms for main memory are 
primary memory and execution store. 

2. auxiliary memory. Information which for one reason or 
another cannot be stored in main memory stored in aux- 
iliary memory. Examples of auxiliary memory are drums, 
disks, and tapes, although a slow-speed core memory might 


also be used for this purpose. Other terms for auxili- 


ary memory are secondary memory and backup store. 
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Main memory has relatively high cost, but also has rapid access 
time; auxiliary memory has low cost, but also has slow access 
time. 

Initially we shall restrict attention to a computer with a 
two-level memory system, indicated by Figure 2-1. After having 
studied program models, we shall generalize to multilevel memory 
systems; this will be done in Chapter 8. 

We assume that the unit of information storage and transfer 
is the page. We suppose the capacity of main memory is M pages, 
and the capacity of auxiliary memory is infinite. 

The N processors and M main memory pages will be called the 
equipment. For generality we assume that only a fraction a, for 
O<a<l, of the N processors are available, and that only a frac- 
tion 8, for O<B<l, of the M memory pages are available. The aN 
processors and the 8M memory pages constitute the available 
equipment. It is against the available equipment that we want 
to balance demand. 

We suppose that each processor can deliver one reference 
per unit time, and that each item in main memory can be refer- 
enced no more than once per unit time, so that the processor and 
main memory speeds are matched. This unit of time will be called 
a virtual time unit (vtu). 

There is a time T, the traverse time, involved in moving 
one page between memory levels. T is measured from the moment 
a page is found to be missing from main memory until the moment 
the missing page has been placed in main memory ready for use. 

T is actually the expectation of a random variable composed of 
waits in queues, access times, mechanical positioning delays, 
and transmission times. We shall regard the traverse time T as 


being the same regardless of which direction a page is moved. 


N processors 


Traverse Time T 


AUXILIARY 


MAIN 


(infinite capacity) 


(M pages) 


Page Traffic 


Figure 2-1. Basic System, with two-level memory. 
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Dividing memory into two levels creates the first allocation 
problem: storage management, the problem of deciding which infor- 
mation is to reside in main memory, which is not. Generally, 
the least-used information must be stored in auxiliary memory; 
the most-used information must be ready for use in main memory. 
When a processor makes a reference to a page not in main memory, 
a page fault occurs, initiating action to secure the missing page 
from auxiliary memory. We thus assume pages are brought into 
main memory on demand only. Because not every useful page may 
reside in main memory, there will be a flow of information —- 


called page traffic -- along the channel bridging the two levels. 


The activity of moving pages in and out of main memory is called 
page-turning, or simply paging. 

Nowhere in Figure 2-1 have we indicated the existence of 
input-output equipment, the media used by programs to commun- 
icate with the outside world, because we are not concerned with 


this type of allocation in this thesis (see Section 1.2). 
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2.2. Multiprocess Computer System Concepts 


Two basic principles in the design of multiprocess computer 
systems are the abstractions of the notions name space from 
memory, and process from processor. In the interest of program- 
ming generality, a user is given the illusion that he is dealing 
with a (configuration-independent) virtual computer. The virtual 
computer comprises one or more virtual processors each having 
most of the capabilities of a real processor, and a virtual mem- 
ory having many times the capacity of the real memory. Because 
the virtual memory has so large a capacity, the user sees no 
auxiliary memory; for this reason virtual memory is often called 


a one-level store [Kl]. It is the task of the operating system 


both to simulate virtual memory by paging information into real 
memory, and to simulate virtual processors with real processors. 


In Multics, the traffic controller mechanism [S52] handles assign- 


ment of real processors to virtual processors, and communication 
among virtual processors. 

The first abstraction, name space, is the set of names 
(addresses) available to a virtual processor for use as data 
identifiers. 

For convenience (to the user) the name space is divided 
into segments, of arbitrary size. To reference a datum, a two- 
component address (S,W) is given, S being the name of a segment, 
and W being the name of a word within S. Because names have two 
components, the name space is often called two-dimensional. 
There is no a priori relation between a name in name space and 
the location of the corresponding datum in physical memory; this 
correspondence is established dynamically by the address-mapping 


mechanism [Al]. 
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For convenience (to the system) in mapping segments of ar- 
bitrary size into a memory of fixed size, segments and real mem- 
ory are divided into equal-size blocks, called pages. The page, 
invisible to the programmer, is the standard unit of information 
storage and transmission. We may thus regard the name space as 
being sliced into equal~size regions. 

Associated with each segment is a page table (itself a page) 
listing each page of the segment. If a page is not in main mem- 
ory, an in-core bit of the corresponding page table entry is OFF; 
an attempt by a virtual processor to reference such a page auto- 
matically causes a missing page fault, which interrupts execution 
of the virtual processor and initiates action to secure the mis- 
sing page from auxiliary memory. After a lapse of at least one 
traverse time (T) the page has been placed in main memory and is 
ready for use; the proper page table entry is set to point to 
the physical memory location of the start of the page, the in- 
core bit is turned ON, and execution of the interrupted virtual 
processor is resumed. Later on, when the page is removed from 
main memory, the in-core bit of the corresponding page table entry 
is again turned OFF. 

It is apparent that pages are on a lower level of abstrac~ 
tion than segments. The operating system should not attempt to 
have each page of a segment in main memory; it should instead 
attempt to have each useful page in main memory. For it is pos~ 
sible that only some of a segment’s pages are in use, and there 
is no need to strain main memory resources by keeping useless 
pages there. Roughly speaking, a working set of pages is the 
smallest collection of pages that must be present in main memory 


for a program to operate efficiently. Storage allocation should 


33 


attempt to keep at least the working set of each running program 
in main memory. 

The second abstraction, process, is the notion of a program 
in execution.by a virtual processor. In our work here, we use 
the equivalent definition: a process is an ordered sequence of 
references to information in name space, under the control of 
an instruction stream. A process is sometimes referred to as 
a thread of control through an instruction sequence. A process 
has four states of existence in real time: 

1. running, meaning that is is receiving the use of a real 
processor; alternatively, that a real processor is as- 
signed to its virtual processor. 

2. xceady, meaning that it is demanding, but not receiving, 
the use of a real processor; alternatively, it is sus- 
pended only because no real processor is currently as- 
assigned to its virtual processor. 

3. page wait, meaning that it is temporarily syspended be- 
cause a page is missing from ain memory. Execution is 
resumed as soon as the missing \page has been placed in 
main memory and a processor is available. We take the 
duration of a page wait to be the traverse time T. 

4. blocked, meaning it has no use for a real processor be- 
cause it is awaiting the occurrence of some (expected) 
external event, such as a message or signal from another 
process, from a device, or from a user at a console. 

Figure 2-2 illustrates the possible transitions among these states. 
The transition from running to ready under a pre-emption means 

that the operating system has required the real processor for 

some other use, for example to execute another process. The 


transition from ready to running under go-ahead means that the 
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Figure 2-2. States of a process. 
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operating system has decided to return the processor to this 
process. 

In Figure 2-2 we have indicated a transition from page wait 
directly back to running, when in fact this need not be the case. 
It is the case if a processor is dedicated to it, being immedi- 
able to resume execution when the process returns from page wait. 
But, if the page wait time T is larger than the time it takes 
to switch the processor to another process, it is uneconomical 
to dedicate a processor to a single process, and in this case 
a process returns to running status via ready status. In our 
work here, we assume a sufficiency of processor resources, so 
that at worst a negligible delay is experienced by a process as 
it passes through ready to running. This is the justification 
for the direct page-wait to running transition shown in Figure 2-2. 

When talking about processes we shall make a distinction 
between virtual time (vt) and real time. Virtual time is time 
as seen by a process as if it were never interrupted; that is, 
the total accumulated time in the running state. Virtual time, 
also called execution time or process time, is measured in vir- 
tual time units (vtu), usually memory cycles. Put another way, 

a virtual time unit is the interval between any two of the suc- 
cessive information references that constitute a process. We 
shall usually regard virtual time as being continuous, even though 
it is actually finely divided into small units. Finally, real 
time is virtual time with page wait, blocked, and ready delays 
inserted appropriately. 

When we talk about the virtual time interval (t-t,t) we shall 


mean the Tt information references prior to the real time-instant t. 
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Because a process is an ordered sequence of imformation re~ 
ferences it is often called a sequential process [Dll]. Ina 
multiprocess computer system, many processes may be executed con- 
currently, or in parallel. Thus, we may speak of parallel se- 
quential processes. 

We define a computation to be a collection of mutually co- 
operating processes and information, all operating in the same 
name space, In Multics [C8, S2, V2] every computation is a 
Single-process computatbon, since there is a one-to-one cor- 
respondence between a process and a name space; however a pro- 
grammer can execute a program with parallel processes by setting 
up a collection of single-process computations with isomorphic 
name spaces. IBM System 360 [R3], RCA Spectra 70 [02], and THE- 
Multiprogrammed System [Dll] are other examples of systems using 
single-process computations. The Illiac IV [S6] is an example 
of a system using multiprocess computations. 

The constraints among the member processes of a computation 
are, from the resource allocation viewpoint, unspecified and must 
be considered arbitrary. For the very same reasons that compilers 
and programmers cannot specify before hand the resource needs 
of their programs (because of arbitrary timing of parallel pro- 
cesses and data dependence), compilers and programmers cannot 
predict the constraints among parallel processes. 

By the term contemporary computer system we shall mean a 
Multics-like system, characterized by single-process computations. 
Such systems are not geared for a high degree of intra-computation 
parallel programming because in them the tables specifying a 
computation are so ponderous that the cost of spawning new pro- 


cesses is prohibitive. 
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We assume that the operating system allocates resources to 
computations, rather than to processes individually. Thus a 
commitment must be made to grant a computation all the processors 
and all the memory it needs. In a contemporary computer system, 
the notion of scheduling a process is the same as this more 


ganeral notion of scheduling a computation. 
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223. Summary 


We have reviewed the basic concepts of the computing en- 
vironment, presuming familiarity with such common notions as 
segments, pages, demand paging, page traffic, virtual computer, 
wirtual processor, and virtual memory. Terms whose meaning is 
important in this thesis are: 

1. process: a sequence of information references. 

2. states of a process: running, ready, page wait, blocked. 

3. virtual time: time seen by a running process. 

4. computation: a family of cooperating processes and 

information within the same name space. 

We turn attention in the next chapters to the definition 


and characterization of the working set model for program behavior. 
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CHAPTER 3 


The Working Set Model for Program Behavior 


3,0. Introduction 


We introduce and justify here the most basic concept in 
this thesis: the locality of information references. This is 
the property of program behavior that, during any interval of 
execution, the program favors a subset of its information. A 
working set of information dynamically measures this set of 
favored pages. A working set memory allocation strategy guaran- 
tees each running process that its working set shall be present 
in main memory. We shall show that working set strategies are 
optimum in two senses: minimum cost and minimum sensitivity to 
thrashing. 

First, we will say that a strategy is optimum when it pro- 
duces minimum cost (the product of memory space and time). After 
discussing various strategies, we show that working set strate- 
gies result in minimum cost. The proof is based on certain con- 
vexity properties, which follow from locality, of the cost function. 

Second, we investigate the causes of thrashing, and show 
that working set strategies minimize the possibility of thrashing. 

We conclude the chapter with a survey of the literature, 


best done in the light of the working set model. 
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3.1. Locality and Working Sets 


3.1.1. Definition and Justification 

Throughout this thesis we shall assume that locality is a 
fundamental property of program behavior. Locality is the pro- 
perty that, during any interval of execution, a process will favor 
some of its pages more than others; during disjoint virtual time 
intervals, the set of favored pages may be different. Put an- 
other way, if one observes a process’s reference pattern for some 
virtual time interval, he will see that the process does not 
scatter its references uniformly across its information. There 
are at least five factors motivating this assumption: 

1. Sequential instruction steams. Both programmers and 
compilers tend to organize sequentially the instructions 
that direct the activity of a process; this is especi- 
ally true in single-address machines (i.e., those with 
a program counter). If a process fetches an instruction 
from a given page, it is highly probable that it will 
soon fetch another instruction, in sequence, from the 
same page. 

2. Functional modularity. Program modules are organized 
and executed by function. 

3. Content-related data organization. Information is us- 
ually grouped by content into segments, and is normally 
referenced that way; thus, references will occur in 
clusters to a content-related region in name space. 

4. Looping. Programs oftm loop within a set of pages. 

5. People. Realizing that their programs will run on a 


paged machine and that page transfers are costly, pro- 
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grammers tend to organize their algorithms so that ac- 
tivity is localized within subsets of their information. 
Moreover, people have been studying methods of minimizing 
interpage references at execution time; see refer- 

ences [B1,B2,C5,M1,01,R1]. 

Experimental evidence suggests that this assumption, locality, 
is a-very good assumption. Suppose to the contrary that, during 
every virtual time interval, a process scatters its references 
uniformly over its information. Suppose that a fraction s 
(O<s<1) of its pages have been placed in main memory. Let 
u(s) be the fraction of its references the process makes to the 
set of pages not in memory; since the references are uniformly 


scattered, it follows that 
u(s) = l-s 


Experimental evidence, illustrated in Figure 3-1, contradicts 
this [Bl,V1]. As measured, p(s) actually follows some curve that 
lies below the curve p(s)=l-s. It has been observed that there 
is some number S, and constant k>l, such that if SSS, then 
p(s)=1-ks; that is, the process is scattering its references 
uniformly over only a subset of its information. The numbers 
$5 and k depend on the particular program and the particular 
storage management rule used to decide what information is to 
reside in main memory. 

We will therefore assume that locality is a property of 
program behavior. 

We define the working set of information W(t, 7) of process 
p at time t to be the set of pages that process p has referenced 


during the virtual time interval (t-t;t). The idea is illus- 


trated in Figure 3-2. 


42 


Cs) 


actual 


Figure 3-1. Evidence supporting locality. 


— 


pages referenced in this 
interval constitute w(t, 7) 


virtual time 
for process p 


Figure 3-2. Definition of a working set. 


ev 
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The validity of the working set model rests on the concept 
of locality. A working set woe) measures the set of pages 
process p is favoring at time t. Assuming that process p is 
not likely to abruptly change its set of favored pages, the 
working set Wp (ts 7) constitutes a reliable estimate of p’s 
immediate memory needs. To put it another way, we are assuming 


that, on the average, 


Prfpage i referenced next |i «& Witt, t)d > 


Pr[page i referenced next Jai Z w(t, 7] 


The working set parameter Tt should be chosen as small as 
possible, and yet assure that W(t, 7) contains p’s favored pages. 
Thus, tT may vary from program to program, and from time to time. 
We shall discuss details of choosing T in Chapter 4. 

We assume that the page size (i.e., the number of words in 
a page) is chosen small enough so that the working set W(t, 7) 
always consists of at least several pages. Indeed, if ina 
particular computer system we observed that working sets often 
consisted of only one or two pages, we would begin to suspect 
that a smaller page size might result in smaller working sets 
and in smaller memory requirements for programs. 

Intuitively, a working set is the smallest set of infor- 
mation that ought to reside in main memory so that a process can 
operate efficiently. A working set memory management policy is 
one that permits a process to be running if and only if there is 
enough uncommitted space in main memory to contain its working set. 

| Define the random variable x to be the virtual time interval 
between successive references to the same page. These inter- 


reference intervals x are useful for describing certain program 
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properties, which we will do in detail in Chapter 4. Let 

Bee S Pr[x<u] denote its distribution function and let x 
deonte its mean. A working set is the collection of a process’s 
pages whose current interreference intervals (in virtual time) 
satisfy x<t. 

By a program we mean the set of information to which a 
process directs its references. There is a relation between the 
size of a program and the lengths of the interreference intervals 
to its component pages. Let process 1 be associated with program 
P, and process 2 be associated with program Pos and let Py be 


1 


larger than Poe Then process 1 has to scatter its references 
across a wider range of pages than process 2, and we expect that 


the interreference intervals x, of process 1 will be longer than 


1 
the interreference intervals X»5 of process 2, That is, Py bigger 


>X 5. 


than P, implies X 1X5 


2 


3.1.2. Pictorial Representations 


It is useful to develop some pictorial representations for 
the notions of working set and locality. Let C be a computation 
and M be the name space used by C; we may imagine that elements 
of M have been grouped together, by pages. We may associate with 
C a process space P whose clements are the processes (sequences 
of information references) of C. If C is a single-process com- 
putation, PB contains just one sequence. if C is a multiprocess 
computation, PB contains several sequences. In Figure 3-3 we show 


a process p in P; the directed line suggests the ordering of the 


pigure 3-36 
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information references constituting p; two of the information 
references, one at time t, the other at time (t-T), have been 
singled out. We may imagine that ee a projection of the 
virtual time interval (t-tT,t) into M. Adjacencies indicated in 
M (i.e., the content of W(t, 7) should not be construed as ad- 
jacencies of address values; they are simply adjacencies of re- 
ferences in virtual time. 

Pigure 3-4 depicts the assumption that the content of 
Wt, ) is not fast-changing. For small time separations a, we 
expect a large intersection between woth) and Wh (tte, 7). For 
large time separations 8B (with B>>a and B>>T) we do not expect 
an intersection between ate; ) and Wp (t+8,7) because p has had 
ample opportunity to finish the work of time t by time (t+B). 
Put another way, we expect a working set Watt.) to be a reliable 
estimate of p’s memory needs only over a short interval. 

Figure 3-5 illustrates the situation for a multiprocess com- 
putation C. Let P(C,t) denote the processes in C at time t that 
are running or in page wait (i.e., receiving the use of resources). 


The information that should be in main memory is 


Wa(t,T) = U Wea 
peP(Cc,t) 
Note that some of the working sets may overlap, because processes 
may share information. 
Note further that P(C,t) may be regarded as a working set 


of processes in the process space P. 
— ~ 
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process space P name space M 


West +e, 7) 


w_(t+B,T) 
p 


Figure 3-4, Time movement of a working set. 
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process space P name space M 


n 
Wo(t,T) = LJ wpe 


Figure 3-5. Working sets for multiprocess computation C. 
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3.1.3. Interactions 

The foregoing discussion deals with locality concepts during 
virtual time intervals that contain no interactions. An inter- 
action is an instant in virtual time at which the process stops 
to wait for a message. What happens to our definitions if the 
virtual time interval contains an interaction? 

When a process stops (blocks) for an interaction, it seeks 
a message or signal from another process, for a device, or from 
a user at a console. An interaction has two properties of 
interest to us: 

1. The process enters the blocked state where it may remain, 

unpredictably, for a long time. 

2. The message received by the process may affect its 

behavior following the interaction. 
This second property means that, if ty is.an interaction instant, 
the working set W, ft» t) may not be a good estimate of any working 
set W 0%) 7) for tot, , because the message may seriously alter 
p’s behavior. 

What we will do is assume that the working sets before and 
after an interaction intersect, though not completely. We believe 
that the expected size of the intersection will tend to decrease 
with long blocked-intervals, because in longer time intervals, 
for example, a user will have more opportunity to change his mind 
and alter the behavior of his process. Conversely, the shorter 
the duration of a blocked-interval, the greater the expected size 
of the intersection between the working set before and after the 


interaction. 
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An example is helpful, Figure 3-6 illustrates a program 
organization likely to be typical of modular, interactive programs. 
The user sends requests to the interface procedure A; having 
interpreted the request, A calls on one of the procedures 
Byres eo BL to perform an operation on the data D. The called 
B-procedure then returns to A for the next user request. Inter- 
actions occur whenever the process enters A to await a message. 

A program organization such as this might be used (for example) 

in an editing program. Just before the interaction, the working 
set will contain A, D, and one B-procedure. Just after the in- 

teraction, the working set will contain A. The intersection is 

just A. 

A study of intersections of working sets before and after 
interactions is needed in order to assess the value of look-ahead 
when a process unblocks. 

Because we are not interested in input-output allocation 
in this thesis, we will no longer be concerned with the effects 
of interactions on program behavior; from now on we assume that 
virtual time intervals contain no interactions. This problem 


has been studied in reference [D5]. 
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Figure 3-6. Organization of an interactive modular program. 
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3.2. Convexity [W2, p.563] 


We shall prove a theorem about convex functions which will 
be of great importance in the following sections. 


A function f(x) is strictly convex on an interval I if its 


second derivative is negative: 
£°°?(x) < O x et 


and f(x) is strictly concave if its second derivative is positive. 
If the second derivative is zero on an interval, we regard the 
function as being either convex or concave on that interval. 

Since f is convex if and only if -f is concave, we may restrict 
attention to convex functions. Figure 3-7 shows a strictly con- 
vex function; note that every line segment connecting two points 


on the curve lies below the curve. 


Theorem 3.1. Suppose x is a random variable on an interval I, 


where its probability density function p, (a) satisfies 


{ p, (u) du = Ll 
f u p,.(u) du = x 


Suppose also that f is a strictly convex function on I. 
Then 

F(x) < £(x) 
and equaltiy holds if and only if p,,(u) = 6(u-x) (the 
impulse function). 


Proof. Since x is a fixed number we may expand f(x) around the 
point x using Taylor’s expansion: 


(xx)? 


f(x) = £(x) + (x-x) £°(X) + 5) 


£?°(z) some zéel 
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Since £’°’(z)<0O for zeI, 


E(x) < £(k) + (x-k)£? (x) 
Taking expectations on both sides, and noting that f(x) 


and £’(x) are constant, 


PCa). em. CORY Aen) Oe) 


but since (x-x)-0 


f(x) < E(x) 
The equality clearly holds if and only if x=x with prob- 
ability l. 


QED, 


In Figure 3-7 we show a geometric interpretation of the theorem 


for the simple case 


p, (a) = 
6) otherwise 


and it is clear that 


£Ox) = si (X) 
Observe from the definition and the figure that, for £ to be 
convex on an interval I, it is sufficient that 

fix+e) + F(x-E€) < 2 F(x) 


for all choices of x and € such that (x+é) and (x-e) are in T. 


£(x) 
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£(x+e) 
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| 
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| | 
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4 


+ £(x-c) 
2 


Figure 3-7. Illustrating convexity theorem. 
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3.3. Working Set Size 
Let W(t,t) be a working set. Define the working set size 


wets E): re. be: 
wit, Tv) Number of pages in W(t,T) = w(t, T) 


We assume that the working set size w(t,T) is a stationary sto- 
‘ . , oe be : 
chastic process. so that the Lime expectation w(t,t)~° is inde- 


pendent of t, and we may write 
w(t) = wlt,2) 


where we understand overbar to mean time expectation. 


Theorem 3.2. The expected working set size w(t) has these 
properties: 

lL. w(t) < 4 

2. wi0) = 0 

3. w(T+A) > 


\ 


4. w(t) is convex. 


w(t) A>0 (non-decreasing) 


Proof: Since the maximum number of distinct references that can 
occur in tT vtu is t, we have w(t,T)<t and hence w(T)<t. 
That w(0)-9 is clear since no pages can be referenced in 
zero time. ‘that w(t+A)>w(T) is also clear since more pages 
can be referenced in longer intervals. To show that w(t) 
is convex, we will show that for all choices of T and € such 


that (t-e)>0, 
2w(t) > w(T1eE) 4 wlt-e) 


So, let t and € be arbitrarily given, with (t-e)>0. Refer 


to Figure 3-8. Observe that 


o 
w 
ea 
= 
ct 
| 
el 
| 
(ia) 
- 
a 
+ 
(Ag) 
we 


Wit, tT) Uwt-t,t) = Wt,t- 


Using |xUy| : |x| + ly} - Ixavy| for any sets X and 


Y, we have 


*Z°¢ weroauL FO Joorzd ay, FOI “s-€ arnbta 


. 3 (a4 74 : 


TTT. 
VEIN ITLL, 
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w(t,t) + w(t-t,T) = wlt,t-E) + w(t-(t-E),T+E) 
+ [a] - | BI 
where 
A = W(t,t) NW(t-t,T) 


B 


W(t, t-€) NW(t-(T-E) , T+E) 


taking the time expectation on both sides, 


w(t) + w(t) = w(T-E) + w(t+e) + JA] - [B] 


We claim JA] - [B[ > 0. To see this, note that Jal < w(t) 
and }3| < w(T-E), so Jal is potentially bigger than [B]. 
During the operation of averaging over all t, any page that 
appears in B must also appear in A. Thus, on the average 
at least as many pages appear in A as in B; hence we have 
TAT - Bl > 0 and the required inequality follows. 

QED. 


Note that properties 1-3 of Theorem 3.2 apply also to the random 
variable w(t,t) itself, but property 4 applies only to w(T). 

In Figure 3-9 we have sketched w(T) for two kinds of pro- 
gram. A hard, or incompressible, program is one with a well- 
defined set of favored pages; a soft, or compressible program is 
one with a fuzzily-defined set of favored pages. A hard program 
tends to scatter most all of its references uniformly over some 
set of a favored pages, so that for any interval TST, we expect 
to see mostly distinct pages referenced, and w(t) increases 
(almost) linearly with tT. For such programs we want to choose 
T2T 5° An example of a hard program is the so-called stream- 
processing program, whose algorithm is contained wholly ina 
set of te pages, and occasional references are made to a sequence 
of data pages; after only a few references, each data page is 


discarded forever. If THOT, the working set will contain many 
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w(t) 


w(T) = t 


hard 


Figure 3-9, Expected working set size. 
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useless pages. Choosing T close. to ue will not affect the 
program’s operating efficiency, but will diminish the amount of 
memory it occupies. 

Recall the definition of the interreference intervals x 
(the vt intervals between successive references to the same page), 
with distribution function F(a) =Pr(xs<ul, and density function 
£ (u)=SF (u). We shall assume F,(u) is convex. This is not 
unreasonable since it requires only that f(a) be decreasing, 
which is consistent with the concept of locality. In fact, there 
is strong evidence that that this type of interarrival distri- 
bution is modelled nicely by a hyperexponential distribution 
[C4,F4], which is convex. 

We define the missing-page probability A(T) to be the prob- 
ability that a process directs its next reference to a page not 
in the working set W(t,T); anaes a working set memory allocation 


strategy, such a page may be missing from main memory. 


Theorem 3.3. Let A(t) = Prf[process references a page not in W(t,t)]. 
Then A(T) = 1-F,(T). 


Proof: The probability the page referenced is not in W(t,T) is 
just the probability its most recent interreference inter- 
reference interval satisfies x>T, so A(t) =Pr[x>t]=1-F,(T). 


QED. 
We will need the following two theorems to prove that a working 


set strategy is optimum. 


Theorem 3.4. Suppose tT is varied on some interval, with mean 7. 


Then the average missing-page probability is increased: 
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Proof: Since we assume Py(u) is convex, A(T) = 1-F (7) is con- 
cave, and by Theorem 3.1, we have A(T) > A(T). 
QED. 


Theorem 3.5. Suppose Tt is varied on some interval, with mean Ts 
Then the average working set size is decreased: 


w(t) < w(T) 


Proof: By Theorem 3.2, w(t) is convex. By Theorem 3.1, we have 
w(T) < w(T). 
QED. 


Varying Tt increases the probability that a missing page will be 
referenced, as well as diminishing the average memory share held 
by a process. That is, varying t with mean T on some interval 
has the same effect as holding t fixed at some TST such that 


w(t_) = w(t). 
(o} 
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3.4. Storage Management Policies 


Storage management policies for multiprogrammed memories 
may be regarded as operating in two provinces: 

l. Fetching (page in): Locate the required page in aux- 
iliary memory, and load it into main memory; turn the 
in-core bit of the corresponding page table entry ON. 

2. Replacement (page out): Remove some page from main 
memory, turn the in-core bit of the corresponding page 
table entry OFF. The policy rule that decides which 
page to remove is called the replacement rule. 

Management algorithms may be classified according to their me- 
thods of fetching and replacement. 

Fetch strategies may load pages before they are needed 
(pre-paging), at the moment they are needed (demand paging), or 
even later. Many strategies use demand paging; that is, no ac-— 
tion is taken to bring a page into main memory until some process 
attempts a reference to it. Demand paging is usually preferred 
to pre-paging because it is much cheaper to implement, and be- 
cause it is not clear that pre-paging improves performance sig- 
nificantly. As we have stressed, advance information is often 
non-existent because there is no reliable source of allocation 
information. In fact the only major argument favoring pre-paging 
is the possibility of moving large contiguous blocks of pages 
from auxiliary memory so that the accumulated traverse time is 
reduced in the long run. Although traverse time reduction is 
(in some sense) a valid argument for pre-paging, we feel that 
it is also a more powerful argument for better, faster, aux- 
iliary memories. 

It may be argued that a working set, a supposedly reliable 


estimate of a process’s immediate memory needs, is the ideal 
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set of pages to pre-load. Because the records required to keep 
track explicitly of which pages belong to which working sets may 
easily become so Complicated that any benefits resulting from 
pre-paging may be lost, we prefer to assume that fetching occurs 
on demand only, via the page fault mechanism. 

The major problem in memory management is not deciding which 
pages to load; it is deciding which pages to replace. A storage 
management policy ‘should attempt to keep in main memory the pages 
most likely to be used. Thus, the best choice for replacement 
is the page with the least likelihood of being reused immediately. 
Debate has arisen over which replacement, or paqe-turning, 
strategy is best. 

The cost of operating a program under a given strategy will 
be defined (Section 3.5.1) to be the amount of memory used times 
the duration of such use. We will say that the optimum strategy 
is the one that results in the lowest cost. In Section 3.5.1 
we will show that low missing-page probability is equivalent to 
low cost. We shall therefore use the missing-page probability 
as a measure of performance for a paging policy; this will be 


done in Sections 3.5.2 and 3.5.3. 


lig a page has been modified since being placed in main memory, 


replacing it involves transferring it into auxiliary memory; 
an unmodified page is simply overwritten, provided there is 
a copy in auxiliary memory. 
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Allocation of pages in multiprogrammed memories can be han- 


dled on either a fixed or variable memory basis: 


l. 


Fixed share. Before being run, a program is granted a 
share of the memory for its private use. 

Variable share. Programs are allowed to compete freely 
for memory space. In principle, more aggressive pro- 
grams should be able to obtain a greater share of the 
memory. In principle, as a program expands or contracts, 


its share increases or decreases accordingly. 


In Section 3.5.2 we shall prove that variable-share strategies 


yield smaller missing-page probabilities than fixed-share stra- 


tegies, all other things being equal. Policy rules for replace- 


ment (which may be used with either fixed or variable share basic 


strategies) fall into the following three classes, ordered in 


terms of 


plement: 


Ls 


We shall 


the intrinsic increase in the logic required to im- 


Static rules, which use no information about page use; 
these rules are very simple to implement. 

Usage rules, which use information about page use, gen- 
erally measuring time intervals since the last reference 
to each page. 

Demand rules, which attempt to predict, on the basis 

of recent reference patterns, the set of pages most likely 
to be used immediately. A program is given more or less 
Space according to its demand for space. 


show in Section 3.5.2 that the static rules lead to 


the highest missing-page: probabilities, the demand rules the 


lowest, 
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There are two static rules of interest: 


1. Random (RAND). Whenever a fresh page of memory is needed, 
a page is selected at random to be replaced. Imple- 


mentation is simple, requiring only a random-number 
generator. 

2. First-in, First-out (FIFO). Whenever a fresh page of 
memory is needed, the page least recently paged in is 
retired and another page brought in to fill the newly 
vacated slot. Whereas RAND requires a random number 
generator, FIFO requires only a counter, and implemen- 
tation is even simpler, as follows. The pages of main 
memory are regarded as a cyclic group; suppose the M 
pages of main memory are numbered 0,1,...,(M-1) and a 
pointer k indicates that the th page was most recently 
paged in. When a fresh page is needed, [(k+1) mod M]-*k, 
and page k is retired. 

The principal argument for these two rules is their simplicity 

of implementation. Yet the experimental evidence [B1,B2,V1] 

indicates that usage rules, despite higher overhead, significantly 

outperform the static rules. 

There are two usage rules of interest, LRU and FINUFO: 

3. Least recently used (LRU). Whenever afresh page of 
memory is needed, the page unreferenced for the longest 
time is removed. Each page table entry contains a use 
bit, set ON each time the page is referenced. At per- 
ijodic intervals, all page table entries are searched, 


use bits reset, and usage records updated. 


Unfortunately, implementation of an LRU rule may become compli- 


cated, and it is not clear whether an overall improvement would 


result. 


66 


A very interesting rule! combines the simplicity of FIFO 


with the sophistication of LRU: 


4. 


First-in, Not-used, first-out (FINUFO). Implementation 
is almost exactly that of FIFO, but now the use bits 

come into play. Let k be the pointer that cycles through 
the M pages of memory. Whenever a fresh page is needed, 
k is incremented until a page is found with use bit OFF; 
this page is retired. When k passes a page with use 


bit ON, the use bit is turned OFF. 


It is interesting to note that FINUFO is much closer to a demand 


rule than to a usage rule, because when demand for main memory 


is high, 


FINUFO will have difficulty in finding a page to remove 


(many use bits ON). Whereas FIFO and LRU will always find a page 


to remove, FINUFO may not. It is therefore a stable rule. 


Another usage rule, primarily of academic interest, is: 


5. 


ATLAS loop-detection. The Ferranti ATLAS computer [K1] 
had a paging strategy that attempted to detect loop 
behavior in page reference patterns, then minimize 

page traffic by maximizing the time between page trans- 
fers; that is, by removing pages not expected to be need- 
ed for the longest time. Performance was satisfactory 
for programs exhibiting loop behavior; unsatisfactory 

for programs exhibiting aperiodic reference patterns, 
because the algorithm attempted to predict loops when 


there were none. Implementation was costly. 


Two kinds of demand rules warrant investigation, biased 


rules and working set rules: 


lReported (by J. H. Saltzer) to be used in Multics. 
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6. Biased replacement rules. In round-robin fashion, each 
program is favored for an interval of time. During its 
favored interval, none of its+ pages are removed and 
it may acquire new pages without hindrance. After its 
favored interval, it will be forced to give up pages in 
deference to other programs. When a page is to be re- 
tired, any of the rules discussed above may be applied 
to the non-favored pages. 

Belady [B2] reports that a biased FIFO rule on the M44/44X com- 
puter improved performance significantly. The arguments given in 
Section 3.5.4 may be used to show that biased rules will perform 
better than non-biased rules (except the working set rule). In- 
tuitively this makes sense, because large programs will have op- 
portunity to expand into memory shares more matched with their 
needs. 

7. Working set (WS). Guarantees that a computation re- 
ceives the use of processor if and only if there is 
enough uncommitted space in memory to contain its work- 
ing set pages. Thus, every page belonging to the work- 
ing set of some running process must be kept in main memory. 
Pages in no working set are subject to removal, though 
need not be removed until the space is needed. A com- 
putation acquires more or less memory in accordance with 
fluctuations in its working set size. Should the total- 
ity of working sets exceed memory, some program (perhaps 
the one present there for the longest time) is removed 


in order to clear space. 


lois complicates implementation, because now identification of 


pages by program is required. 
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These seven are a portfolio of the most interesting and most 
important rules. 
In the next section we will compare all these strategies 


and show that WS is optimum. 
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3.5. Working Set Strategies are Optimum 


The following demonstration that working set strategies are 
optimum is based on the concept of locality, because we shall 
rate strategies by their ability to retain a process’s favored 
pages in main memory. 

First we show that the missing-page probability is a valid 
measure of performance for a paging policy. Then, using the 
convexity properties of working set size and missing-page prob- 
ability (Theorems 3.4 and 3.5) we show that the working set 
strategies have the lowest missing-page probabilities. For ease 
in discussion, we start by studying the algorithms operating on 
one program in a cramped memory. We then generalize to the case 


that many programs reside together in memory. 


3.5.1. The Cost of a Strategy 


Suppose h(t) is the number of pages of memory held by a cer- 
tain program at time t. Define the cost C(I) for memory usage 
over the real time interval I to be 
(3.5.1) c(I) = f h(t) du 

E 
which is the space-time product of memory usage. We will say 
that the best strategy is the one that produces minimum cost. 
C(I) includes page wait times in the interval I; even though the 
process is not running, its information still occupies space 
during page waits. 


For convenience, we shall deal with the cost per unit 


virtual time, G, which we define to be 
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C(I) 
v(I) 


(3.5.2) G = 


where v(I) is the amount of virtual time contained in I. We 
are interested in the paging policy with smallest G. 

We consider a certain program consisting of r pages, which 
is operating in a space of s memory pages. The missing-page 
probability w(s) is the probability that the process references 
a page not in main memory, when s pages are in main memory; 
clearly, u(s) depends on the paging algorithm?. 

It is important to note that u(s) is an average. To measure 
u(s) experimentally, one would run a program in a space s, using 
a given paging algorithm, for a virtual time interval of length V. 
If he observed R references to pages not in memory, he would 
assign p(s) =8, Thus, p(s) is also the rate at which page faults 
occur, for in a virtual time interval of length V, we expect 
Vu(s) page faults. 

We have sketched p(s) in Figure 3-10 for two strategies, 
which we shall call 1 and 2 (cf. Figure 3-1). 

The cost per unit virtual time G(s) of a strategy, as a 
function of the number s of pages in main memory, is related to 
u(s) as follows. Suppose the program has executed for V vtu, 
and suppose uU(s) is Sehstent over this interval. The expected 
number of page waits is Vu(s), and so the total elapsed real 
time is 


(3.5.3) en Vuts)T = Vl+p(s)T) 


where each page wait costs one traverse time T. The memory 


when the missing-page probability is a.function: of memory space 


s, we will write it as p(s). When it depends only on the work- 
set parameter T, we will write it as A(T). 
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two strategies. 


Missing-page probability for 


Figure 3-10. 


uy 


Figure 3-11. Cost per unit virtual time for two strategies. 
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space s is constant for this interval, so the cost is 
(3.5.4) st. = sV(l+p(s)T) 


and the cost per unit virtual time is 


(3.5.5) G(s) = Str = s(1+p(s)T) 
Vv 


We have sketched G(s) for the two strategies, 1 and 2, in 
Figure 3-11. Because p(s) for strategy 2 is flatter than for 
strategy 1, the optimum memory size Sopt is smaller for stra- 


tegy 2 than for strategy 1. Moreover, because 
(3.5.6) Ho(s) < p(s) 

it follows that 

(3.5.7) Go(s) < G(s) 


We therefore obtain two important conclusions. First, 
the smaller the average missing-page probability, the cheaper 
is the policy. Missing-page probability is’ therefore a valid 
performance measure. Second, the smaller the average missing- 
page probability, the smaller is the optimum memory space Sopt* 
Hence, under better strategies, more programs can be placed in 
memory. 

If the working set parameter tT is properly chosen (Chapter 4), 


it is possible to cause a working set strategy to operate at 


or near its current value of Sopt* 
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3.5.2. Single Program Case 


Imagine an experiment (depicted in Figure 3-12) in which 
the same program is executed in memories A, B, C, and D; memory 
A uses RAND replacement, B uses FIFO, C uses LRU, and D uses 
FINUFO. Let u(s) denote the missing-page probability when a 


fraction s (0<s<l) of the program is in memory. We claim that 


Ha(s) D> bls) 
H_(s) > Hels) 


Ly (s) > H.(s) 


Since we assume locality is a basic program property, the ques- 
tion is: How well do RAND, FIFO, LRU, and FINUFO keep a program’s 
favored pages in memory? 

To answer the question we imagine that we are trying to 
measure Ha(s), H,(s), Ue(s), and Hp (s) by observing the rate 
at which page faults occur. 

Since a process references its favored pages most often, 
we expect that the least recently referenced pages in memory are 
the least favored; thus, LRU tends to retain favored pages. RAND 
may very easily select a favored page, even one that LRU would 
not; thus, we expect RAND to induce more page faults over an 
execution interval than LRU, and so Hy (s) > als). Under FIFO, 
it is certain that every page will eventually be removed; thus, 
we expect FIFO to induce more page faults over an execution in- 
terval than LRU, and so p,(s) > U.(s). 

We make no claim that one or the other of RAND and FIFO 
is better. On the one hand, there are cases in which RAND is 


better than FIFO (e.g., an unchanging set of favored pages -—- 


RAND FIFO LRU FINUFO s 


Figure 3-12. Conceptual experiment to compare strategies. 


p(s) 
1 
A,B 
0\ A ; s 
1 
ne Ir 


Figure 3-13. Missing-page probabilities for the strategies. 
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FIFO eventually removes every one of them, whereas RAND may not). 
On the other hand, there are cases for which FIFO is better than 
RAND (e.g., a process which changes its set of favored pages 
completely before FIFO completes a cycle -- FIFO removes the 

old pages first, whereas RAND may select some of the new favored 
pages). FIFO, of course, is cheaper to implement. 

The FINUFO algorithm operates nearly the same as LRU, there 
being little difference, except in cost of implementation. If 
all the use bits have been set, FINUFO will do worst than LRU 
for the following reason. On its first cycle through memory, 
FINUFO finds all use bits ON, and clears them; since the process 
is in page wait, the use bits remain OFF, so on its second cycle 
through memory, FINUFO will select the first page whose use bit 
it cleared on the first cycle. Thus, FINUFO essentially selects 
a page at random. Assuming correlation between age and useful- 
ness, we expect that there are situations in which LRU induces 
fewer page faults during an execution interval than FINUFO, and 
so on the average wp(s) > p(s). 

In Figure 3-13 we show p(s) sketched for memories A, B, 

C, and D in our conceptual experiment. In Region I, all three 
policies behave equally poorly, because too few pages are in 
memory. In Region II, the differences become apparent. At s=l, 
all three policies are again the same, since the program is en- 
tirely present in memory. 

To complete the discussion, we must show that WS is better 


than LRU. 
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We imagine a second experiment, to compare LRU and WS, 
shown in Figure 3-14. Memory A, of size M, is run under LRU. 
Memory B, of variable size, runs under WS with tT fixed at T 
such that the average working set size is w(t) = M. Note that 
LRU and WS are very similar in operation: LRU keeps the M 
most recently used pages in memory, whereas WS keeps the w(t,T,) 
“most recently used pages in memory. 

Figure 3-15 compares the behavior of the two policies. 
Figure 3-15a shows tT fixed at U3 at two times t) and ty the 


working set size is w(t) ,T)) = w, and w(to,T,) = Wy, and so the 


1 
size of memory B varies at least over the range (WW). 

Figure 3-15b shows that memory A does not vary, is fixed at M. 

2 memory A is operating at Ty) and 

To, respectively. That is, we may hold the working set size 7 


Hence at the times ty and t 


fixed at M by varying tT so that the working set is always exactly 
contained in memory A. 

Thus, memory A is simulated by a working set strategy with 
tT varying around mean Te on the range (T1575) and memory B has 


t fixed at TO: Writing the missing-page probability for WS as 
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Figure 3-14. Experiment to compare LRU and WS. 


w(t,T) wlt,t) 


Figure 3-15. Comparison of LRU and WS. 
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A(T), and recalling Theorem 3.4, 


and so WS is at least as good as LRU. 

Intuitively this makes sense. Suppose the program is con- 
stant at size M throughout execution except for a single refer- 
ence to the (m+1)8* page. If it is in memory A, the reference 
to the (m41)8* page displaces some other page in A, which must 
be recalled. 

Note that we have also shown that a variable size share of 
memory is superior to a fixed size share of memory. This makes 
sense since: 

1. As we have stressed, advance knowledge of program size 
is often non-existent, and indeterminable. The share 
cannot be chosen optimally in advance of execution. 

2. If each program gets a fixed share of memory, we cannot 
guarantee that memory is densely packed with the most 
useful information. A small program operating in too 
large a space is occupying space it does not need, space 
which could and should be given over to a large program 
operating in too small a space. Using variable shares 
permits allocating space on the basis of need. 

If the programs in question do not satisfy locality, the 
arguments above fall apart. Consider, for example, the case of 
an (n+l)-page program which cycles endlessly through the (n+1) 
pages; operate this program in a memory of size n. Clearly, the 
least recently used page is the one about to be referenced, so 


LRU makes the worst possible decision. Similarly, FIFO and FINUFO 


79 


remove a page just before it is referenced. Only RAND has non- 
zero probability of not making a mistake, and is the best of the 
four. If the working set parameter is Tt<n, the working set 
never contains the page next to be referenced, and so WS is 


poor in this case too. 


3.5.3. Multiprogrammed Case 


How do programs interact with each other, if at all, under 
each of these strategies? How can the memory demands of one 
program interfere with the execution of another? We can obtain 
answers to these questions by examining the missing-page prob- 
ability. 

The missing-page probability p» is the probability that a 
process makes a reference to a page not in main memory. In the 
multiprogrammed case, we expect it to be a function of the pro-~ 
gram size r (r is the number of pages in the program), of the 
number n of programs simultaneously resident in main memory, and 


on the main memory size M: 


(3.5.8) (missing-page probability) = wu(n,r,M) 


In the following discussion we assume that locality is a basic 
behavior property. 

Suppose there are n programs in main memory; intuitively 
we expect that if the totality of working sets does not exceed 


the main memory size M, then no program loses its favored pages 
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to the expansion of another. That is, as long as 


n 
(3.5.9) oy w,(t,t,) <M 

i=l 
there will be no interaction among programs, and we expect the 
missing-page probability to be small. But when n exceeds some 
critical number No» the totality of working sets exceeds M, the 
expansion of one program displaces the favored pages of another, 
and so the missing-page probability. increases sharply with n. 


Thus, we have 


(3.5.10) b(n, ,r,M) > H(n,,r,M) if n, > No 


This is illustrated in Figure 3-16. In other words, it costs 
more to operate a program in a crowded memory than to operate 
it in a roomy memory. 

If a paging algorithm operates in the range n > no» we 
will say it is saturated. 

Next we want to show that the RAND, FIFO, LRU, and FINUFO 


algorithms have the property that 


(3.5.11) w(n,r, yM) > b(n,r5,™) if n>n and r)>ro 


That is, a large program is more likely to lose pages than a small 
program, when the algorithm is saturated. Put another way, it 
costs more per page to operate a large program in a crowded mem- 
ory than to operate a small program in a crowded memory. 

To see that this is true for RAND, observe that a large pro- 


gram occupies more space in memory than a small program, and so 


hough it may lose favored pages because of foolish decisions 


by the replacement rule; for example, RAND or FIFO. 
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u(n,r,M) 


Figure 3-16. Behavior of missing-page probability. 
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has more pages as candidates for random selection to choose from. 
To see that this is true for FIFO, observe that a large program 
tends to execute longer than a small one, and is thus more likely 
to be still in execution when FIFO gets around to replacing its 
pages. To see that this is true under LRU, recall that if pro- 
then the interreference intervals 


gram P. is bigger than P 


1 2? 


satisfy x >X5 ~~ the large programs are the ones that tend to 


1 
reference the least recently used pages. To see that this is 
true for FINUFO is more difficult. If n>-no> all the use bits 
are ON, until some program stops for a page wait; since it can 
no longer set its use bits, such a program will tend to lose all 
its pages. Inasmuch as large programs have more pages, a large 
program will suffer more when it enters page wait. 

By definition, a WS algorithm makes the missing-page prob- 
ability independent of n and M, since eq. 3.5.9 is assumed to 
be satisfied. In fact, Theorem 3.3 shows that the missing-page 
probability depends only on Tt in this case: A(T) = 1-F (1). 

Thus, the RAND, FIFO, LRU, and FINUFO policies result in 
higher costs when the memory is crowded. By avoiding crowding, 
WS results in lower cost. 

If we ask the question: How well does each strategy fare in 
keeping the working set of a process in memory? We again see 
that FIFO and RAND are worse than LRU, which in in turn com—- 
parable to FINUFO. If we regard the entire memory contents as 
a large multiprocess computation, the same arguments of the 
preceding section show that WS results in lower missing-page 
probability than LRU. If FINUFO is kept away from saturation, 


it should perform nearly as well as WS. (FINUFO is nearing sat- 
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uration when it cycles once through the memory in a time com- 
parable to the traverse time T.) 

It might be argued that our comparison of LRU and WS (at 
least in the single program case) is not strictly valid because 
WS operates with surplus memory. That is, a larger memory is 
needed in order to provide buffer space to absorb working set 
expansions. This is quite true of one program only is using 
memory. In the multiprogrammed case WS shows its superiority: 

1. WS makes programs independent in the sense that the 

expansion of one program cannot displace working set 


pages of another. 


2. When the size of the memory becomes large, the fractional 


requirement for buffer space to absorb working set ex- 
pansions becomes small. This is shown in Chapter 8. 
3. If t is properly chosen, each program operates in the 


vicinity of its optimum cost size (sy in Figure 3-11) 


pt 
-- thus it is possible to fit more programs cheaply 
into memory under a WS strategy than under any other 


strategy. 


3.5.4. Use of Biased Replacement Rules 


Belady has shown that biasing the FIFO rule (see Section 3.4) 


on the M44/44X computer improved performance significantly [B2]. 
We wish to show that this is true in general: by slowly varying 


the memory share of a program the probability of referencing a 
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missing page is reduced. A WS strategy is still superior be- 
cause it varies the memory share exactly in accordance with need, 
whereas the biased rules do not guarantee that the memory share 
enlarges at the times the program would like to see it enlarged. 

We shall show that biasing the LRU rule improves performance, 
but not to the point of WS. Since LRU is better than RAND or 
FIFO, it follows that biasing RAND or FIFO produces corresponding 
improvements. Since a non-saturated FINUFO rule behaves very 
much like a WS rule, there is little point to biasing it. 

To show biasing improves LRU we shall show biasing increases 
the average value of Tt for each value of the random variable 
w(t,T). Suppose w(t,T) is known; choose T=T such that D2 
wt(t,M), where M is the memory size and w + is the inverse 
function of w(t,t) with respect to tT. Now let the memory size 
be s, and let s vary such that s=M. Since wt (t,7) is concave 


(see Figure 3-9), we have from Theorem 3.1 


a 
wt (t,s) > wt(t,M) 


that is, 
TODS OT 
= fe) 


Since the average value of tT has been increased, the missing- 


page probability A(T) must decrease: 
A(T) < ACT.) 
° 


Of course, since the memory variation is out of phase with the 


variation of w(t,T), a pure WS strategy is better. 
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3.6. Thrashing 
It has been observed that even the slightest attempt to 


overuse memory may trigger a total collapse of service efficiency, 
rather than the moderate degradation that might be expected. 

This phenomenon is known as thrashing. We show that thrashing 

is caused by the large value of the traverse time T. 


In this section we write for the missing-page probability. 


3.6.1. The Causes 

Suppose that a certain process has executed for a virtual 
time interval of length V and that the missing-page probability 
p is constant over this interval. The expected number of page 


waits is then (Vy), each costing one traverse time T. We define 


the duty factor nlp) to be: 


(elapsed virtual time) 


nlp) «= 
(elapsed virtual time) + (elapsed page wait time) 
Vv 1 
(3.6.1) 7.) = -—-_e—— = — 
Vi+ VpTt 1 + pt 


n() measures the ability of a process to use a processor. 
Figure 3-17 shows n(p) sketched for five vaiues of T: 
T = 1, 10, 100, 1000, 10000 vtu 
If 1 vtu is taken’ to be 1 microsecond, and the rotation of the 
fastest existing rotating auxiliary storage devices is taken to 
be 10 milliseconds, then T=10000 vtu may be regarded as typical 


for existing computer systems. 
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NC) 


Figure 3-17. Duty factor for various T. 
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The slope of n(p) is 
d T 
(3.6.2) nr(p) = ai nip) = - 


ig (1 + pr)? 


which for small » and T>>1, is extremely sensitive to a change 
in wp. It is this extreme sensitivity of n(p) to changes in p 
for large T that is responsible for thrashing. 

To show how the slightest attempt to over use memory can 
wreck processing efficiency, we perform the following conceptual 
experiment. We imagine a set of (n+l) identical programs, n of 
which are initially operating together in a memory, at the verge 
of saturation (i.e., n=n, in Figure 3-10) with no sharing. Then 
we examine the effect of introducing the (naiye* program. 

Let 1,2,...,(n+l) be this set of (n+l) programs, each of 
size r. Initially, n of them occupy the memory, so that the 
memory size is M=nr. Let By denote the missing-page probability 
under these circumstances, assume H<<l, and that n(ph,) is, 
reasonable (i.e., it is not true that n(u,)<<l). Then the 
expected number of busy processors (ignoring the cost of switch- 


ing a processor between processes) is: 


= n 
(3.6.3) a = > (= o—_ 
i‘"o 
isl ak Bt 


Now introduce the (n+1)8* frogram. The missing-page probability 
increases to (h+d) and the expected number of busy processors 
becomes 


n+1 


n+1 
(3.6.4) Be », ny lHgt®) = 
j=l 1 + (yt6)T 
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Now, if nr pages consume the memory and we squeeze another size r 


program into memory, the resulting increase in missing-page prob- 


ability is 
r 1 
(3.6.5) 5 = -_ 
(n+l)r n+l 


since we are assuming that the paging algorithm acquires the 
additional r pages by displacing r pages unifoymly from the (n+l) 
programs now resident in memory. The fractional number of busy 


processors after introduction of the (n+1)8* program is 


n+l 1+ pT 
(3.6.6) 2 = 2 

n 1+ (p.+6)T 

° 
Now, assume T>>n>>1. We argue that 6 = — >> Us To see this, 
suppose to the contrary that Sh, then 
1 1 n+l 
(3.6.7) ny) >) eS oe —__— << 1 
1 + 6T l + nt+il1+tT 
n+l 


which contradicts our assumption that, in the non-saturated 
operating region, efficiency is reasonable. Thus, when T>>n>>1 


and 5>>5> it is easy to show that 


(3.6.8) 


The presence of one additional program has caused a complete 
collapse of service. 

The sharp difference between the two cases at first defies 
the intuition, which might lead us to expect a gradual degrad- 


ation of service. The large value of the traverse time T is the 


TUMPLTIPS NE Stier ase nD Bee = BEES 9 oltre cpa i ipo sents Gite ttt etapa nod outer teint gE Se 


root cause. It is interesting to note that Smith [S7] has 


warned of this behavior. 


3.6.2. The Cures 

To cure or prevent thrashing, we must do two things: first, 
we must prevent the missing-page probability » from fluctuating; 
and second, we must reduce the traverse time T. 

In order to prevent 2 from fluctuating, we must be sure 
that the number n of programs residing in main memory satisfies 


nn, (Figure 3-16), which is equivalent to the condition that 
(3.6.9) 2 w, (t,7,) <M 


where w, (t,T;) is the working set size of program i. In other 
words, there must be space enough in memory for each program’s 
working set. This strongly suggests that a working set strategy 
be used. 

In order to get the largest number of programs in memory, 
that is, to maximize no» we want. to choose: T as small as pos- 
sible and yet be sure that wW(t,T) contains a process’s favored 
pages. Programmers can cooperate in this effort by designing 
algorithms to operate locally on data, consciously keeping the 
working set small and not moving about too rapidly. A program- 
mer is rewarded for this effort, because not only does he achieve 


a high operating duty factor, he also pays less for use of memory. 
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With the FIFO, RAND, and LRU algorithms, it is very diffi- 
cult to ascertain No» and therefore difficult to control the 
possible p-fluctuations. The FINUFO algorithm displays some 
natural tendency to refuse to run more than n, programs (the 
extra ones tend to be completely unloaded). 

The problem of reducing the traverse time T is more diffi- 
cult. Recall that T is the expectation of a random variable com- 
posed of queue waits, and mechanical delay factors. Using op- 
timum scheduling techniques on disk and drum auxiliary storage 
devices [C2,D3,F3], together with parallel data channels, we 
can effectively remove all but the mechanical delays from T; 
accordingly, T may be made comparable to a disk arm seek time 
or to half a drum revolution time. To reduce T further would 
require reduction of the rotation tome of the device (for 
example, a 40,000 rpm drum). 

A much more promising solution is to dispense altogether 
with a rotating device as the second level of memory. A three- 
level memory system (Figure 3-18) would be a possible solution, 
where between the main level and the drum we have introduced a 
slow speed bulk core storage. The analysis of Section 3.6.1 
suggests that speed ratios in the order of 1:100 (i.e., T#100 
vtu) between adjacent devices would lead to much less sensitiv- 
ity to traverse times and permit tighter control over the factors 


that cause thrashing. For example: 


level type of memory device access time 
(0) thin film 200 ns. 
1 slow speed core 20 us. 


2 very high speed drum 2 ms. 
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MAIN AUXILIARY 


Figure 3-18. Three-level memory system. 
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We cannot overemphasize, however, the importance of a 
sufficient supply of main memory, enough to contain the desired 
number of working sets. Paging is no substitute for real main 


memory. 
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3.7. Survey of the Literature 


Various studies concerning the behavior of paging algor- 
ithms have appeared. The earliest published study, by Fine et 
al. (F6], investigates the effects of demand paging and seriously 
questions whether paging is worthwhile at all. Their experiments, 
as well as the more discerning experiments of Varian and Coffman 
{V1], confirm this: if a program is forced to operate in a space 
smaller than its working set, considerable paging activity may 
seriously interfere with efficiency. The remedy is not to dis- 
miss paging, it is to provide enough main memory. Paging is 
no substitute for real memory. 

Experience with the M44/44X computer has yielded important 
insights into program behavior [01]. Belady and his colleagues, 
noting the concavity of efficiency vs. core-share curves, were 
able to improve efficiency significantly by artificially varying 
a program’s core share; this led to the biased replacement rules 
[B2}. Belady has defined a unit of storage allocation, the 
parachor [B2], which is that amount of information that must be 
loaded in main memory for the program to spend no more than half 
its time in page wait. We shall discuss in Chapter 4 the re- 
lation between parachor and working set. Belady has also com- 
pared some of the paging algorithms mathematically [Bl]. His 
most important conclusion in this area is that an ideal replace- 
ment rule should have much of the simplicity of RAND or FIFO 
(for efficiency) and some, though not much accumulation of data 
on past reference patterns. 

Randell and Kuehner [R2] have a good survey of all the 
techniques commonly used to handle multiprogrammed memory alloc- 


ation, ranging from various name space concepts, across look 
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ahead and replacement rules, to problems of optimum page size. 
Oppenheimer and Weizer [02] report on simulations of the 
RCA Spectra 70/46 Time-Sharing Operating System when memory 
allocation is based on a strategy related to working sets. Their 
experiments indicate that this type of allocation markedly im- 


proves performance. 
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3.8. Summary 


Starting from the assumption that locality is a basic pro- 
gram behavior property, we developed the working set model for 
program behavior. Locality is the property that, during any 
virtual time interval, a process favors only a subset of the pages 
available ot it; a working set is a dynamic measure of this set 
of pages. Locality manifests itself as convexity in the working 
size and as concavity in the missing-page probability. Experi- 
mental evidence suggests that locality is a very good assumption. 
There is every reason to beleive that a programmer who keeps 
in mind a working set concept can make this property strong 
in his programs. 

A good performance measure for paging policies is the missing 
page probability, since lower missing-page probabilities result 
in lower memory usage costs. We showed that working set strate- 
gies achieve the lowest missing-page probabilities and operate 
dynamically in a memory space close to that which achieves mini- 
mum cost. 

We also showed that thrashing is directly traceable to the 
large value of the traverse time T. By minimizing the possibility 
of fluctuations in the missing-page probability, a working set 
strategy can markedly decrease sensitivity to thrashing. 

Thus, a working set strategy has three advantages. First, 
it results in lowest costs of operating programs in memory. 
Second, it reduces sensitivity to thrashing. Third, it makes 
programs independent of one another, in the sense that memory 
acquisitions of one program do not interfere with the working 


set holdings of another. Because of this, analysis will be simple. 
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In the next Chapter we refine the working set model and 
derive detailed properties, in the case of no sharing. In 
Chapter 5 we give attention to the case of sharing, when working 
sets overlap. The reader who is interested in the ideas of 


demand and balance should turn directly to Chapter 6. 
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CHAPTER 4 


Further Properties of the Working Set Model 


‘4.0. Introduction 

Having seen the basic concepts beneath the working set model, 
we are in a position to investigate its properties more thoroughly. 
Here in this chapter we shall refine the working set model in 
the very simplest case: single-process computations with no infor- 
mation sharing. In Chapter 5 we shall investigate the additional 
complications that arise from multiprocess computations and over- 
lapping working sets. 

One of the properties of a working set memory management 
policy is the statistical independence of working sets: the ex- 
pansion of ane working set cannot displace pages of another. 
Because of this, we may analyze the behavior of a single process 
and its working set, and then extend the results in a simple way 
to collections of independent processes with non-overlapping 


working sets. 
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The quantities we shall derive here in this chapter are 


described in the following table. 


quantity symbol description 
missing-page probability ACT) probability that a process, 


when making an information 
reference, directs the re- 
ference to a page not in 
the working set W(t,T). 
paging rate p(t) number of pages per unit 
real time re-entering a 
working set W(t,T). 


expected working set size w(T) expected number of pages 
in working set W(t,T). 


variance of working set size o2(t) -- 

duty factor n(T) fraction of time a running 
or page wait process spends 
running. 

t-sensitivity s(t) rate of increase of missing 


page probability to decrease 
in tT. 


The interreference distribution F.(w) plays a key role in the 
analysis, since all these quantities may be expressed in terms 
of FL(u). 

We begin by deriving an important result: the mean inter- 
reference interval is also the mean program size. An interesting 
consequence of this is that the expected working set size depends 
only on the interreference distribution. We derive, one by one, 
the quantities listed in the table above; then we show how each 
of these quantities is useful in determining the allowable range 
of Tevalues to be used; we discuss the problem of predicting 
working set sizes; and finally we discuss how a working set memory 


allocation strategy might be implemented. 


99 


During the remainder of this chapter we assume: 

1. No sharing. That is, working sets do not overlap. 
2. Single-process computations. 

3. Only working-set pages are in memory. Any page which 


leaves a working set is automatically removed from 
main memory. | 
4. Unlimited processor-memory resources. But, since only 

a finite number of processes, each with a finite working 

set, are active, only a finite amount of each resource 

type is in use. 
The third assumption is a worst-case assumption, in the sense 
that a working set strategy would not normally retire a non- 
working-set page until there was need for the space it occupied. 
The fourth assumption allows us to ignore for the time being 
whatever additional problems arise from lack of equipment. We 
shall discuss these problems in Chapter 8, when we examine the 


equipment configuration. 
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4.1. The Relation between Program Sizes and Interreference Intervals 


As before, we define the random variable x to be the virtual 
time interval between successive references to the same page. 
Thus, these intervals x are the interarrival times between re- 
ferences. 

The distribution function is F,(u) = Pr[xgu]. The density 


os : 
function is. £,,(u) =a F,(u). The mean is 


(4.1.1). x shed u f,(u) du = ps (1-F,(u)) du 


where this latter integral can be verified by integrating the 


former by parts}, The second moment is 


(4.1.2) x* = x u2 £ (a): du 


and the variance is 32 = x? - x2, We assume both x and x? are 


finite (that x is finite is shown shortly in Theorem 4.1). 


Ime formula is: fy dz = yz - fz dy. In eq. 4.1.1 let y=u, dy=du, 


dz=f,(u) du, z=F,(u). We integrate from 0 to a, and let a tend 
to infinity when we are done. Then 


fou £,.(u) du = yz | - Lez dy = uF, (u) [% - SOF, (a) du -@ + Sau 


where we have added and subtracted a = Sau. Noting that 
uF (u)|? = OF, (a) and regrouping terms, 


fou £,(u) du = SGa-F, (aw) du - o(1-F, (a) ) 


To complete the proof, we must show a(1-F,(a)) tends to 0 as a 
tends to infinity, when x has finite mean. Now, a(1=F, (a) ) 60 
if and only if (LP) (a) )/C 1/0) —®~0 if and only if (L’Hospital’s 
Rule) £ (a) /(1/a2) 0 which is exactly the condition that x be 
finite. 
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By a program Z(t) at time t we mean the set of pages toward 
which a process directs its information references!, The program 
Size is z(t) = Jz(t)| - We assume that z(t) is a stationary 
stochastic process, so that we may write 7 instead of atti, and 
Z instead of z(t)*. ‘The program size distribution is Po(w), 
across the ensemble of all programs. 

We must be careful not to confuse a program Z(t) with a work= 
ing set W(t,t). A working set W(t,t) is related to a program 
Z(t), thus: 


(4.1.3) wt,t) © LP as) = 2lt,e 
se(t-t,t) 


where (t-t,t) is a virtual time interval. Because of our assump- 
tion of locality, we assume also that the content of Z(t) does 
not change appreciably over intervals of length Tt, so that the 
size of Z(t,t) is described by the random variable z. Thus, 


we assume 

(4.1.4) [z(t,v)| = 2 

Recalling the definition of the working set size bees we have 
(4.1.5) w(t,t) < 2 

and hence 


(4.1.6) ' w(t) £< 2 


4) more detailed view would note that a program contains the 


instruction stream that directs the activity of a process, 
together with the data used. However, we do not require this 
much detail in our analysis. 
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Theorem 4.1. Let x be the interreference intervals, with mean x. 
Let z be the program sizes, with mean z. Then X = Z. 
Proof: Refer to Figure 4-1, where we have shown a set of z2 


pages consitituing a certain program. Let 


(4.1.7) _ py | Process references page i, 
ey Pio when program size is z 
then, 
a 
(4.1.8) >. Py =u lt 
i=l 
Now, let 
(4.1.9) (x]2) = interreference interval to page i, 
— i * when program size is a. 


In a sequence of independent trials, with Pr[{success]=p,, the 


expected waiting time until success is l/p;- Thus, 


(4.1.10) xf, = = 


i 
For the entire program, 
z Zz 
en a 1 
(4.1.11) (x]z) = >, 12; P; = » P, P; = 2 
i=l i=l 


Now, taking the expectation on z, 


= ——== = 
Cabs 16 V2) x = (xz) = Z 
QED. 
Corollary 4.1. Let 2) and Z>5 be programs; let xy be the inter- 
reference intervals to 21 and Xo be the interreference in- 
tervals to Zo- Tf 2) is bigger than Zo» then xy > Xo° 
Proof: Let Z, = 24 and Zo = 25 « We have 
x, = Cx]2_) = 2) > Za = (x]z5) = Xo 
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pages 


process 


stream of references, 
at rate 1 per vtu 


Figure 4-1. A simple program model. 
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Note, however, that we cannot also claim Pi (uaF (ua); i.e., 
that the interreference distribution is also the program size 
distribution. In a given computer system, there will 


be some largest program size Zoe It is unreasonable to assume 
(4.1.13) Pr[x > Zo) = 0 


because even the largest program may contain pages it uses only 
rarely. Thus, we expect that the variance of of interreference 
intervals will be greater than the variance 02 of program size. 

Since Fu) and Fifu) describe the ensemble of all programs, 
we can make no claim that any given program is reliably described 
by F,(u) or Fou). We can, Howeede1sia that a balance set 
of programs, being large, is representative of the ensemble; thus 
the quantities we shall derive in the next sections, expressed 
in terms of F(u), are applicable to balance sets of programs. 

A question which may have occurred to the reader is: How 
does page size (number of words to a page) enter into our con- 
siderations? Page sige is accounted for implicitly in the defin- 
itions of the interreference intervals x and the traverse time T. 
On the one hand, halving the page size makes the same program 
comprise twice as many pages; from Theorem 4.1, we see that the 
interreference intervals become twice as long. That is, smaller 
pages are referenced less often. On the other hand, the traverse 
time T contains a component due to page transmission time, which 
depends on page size. : 

Thus, provided that pages are sufficiently small that work- 
ing sets contain several pages, all our results are independent 


of the page size. 
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4.2. Missing-Page Probability 
We showed in Theorem 3.3 that the missing-page probability 


depends on FL (u): 

(4.2.1) A(t) = 1-P,, (Ct) 

and is just the probability that the page referenced satisfies 
x>T. The next theorem shows that A(t) may be regarded as the 
rate, in virtual time, at which pages re-enter the working set 
w(t,t); that is, 1/A(t) is the expected virtual time interval 


between references to a page not in W(t,t). See Figure 4-2. 


Theorem 4.2. Let A(t)=1-F,(t) be the missing-page probability. 
Then A(t) is also the number of pages per unit virtual 


time re-entering W(t,T). 


Proof: Let Z(t) be a program. We consider first the behavior 
“OE a typical page in in 2(t) and then obtain the behavior of Z(t) 
by summing the behaviors of its component pages. 

Let {to} 5 be a sequence of virtual time instants at which 
references to one i occur (Figure 4-3). The nth interreference 
interval is 
(4.2.2) x, = tatty 


Now, we assume the interreference intervals {x boa are statis- 


tically independent?, so that for all n>l: 
(4.2.3) f, (u) = £,.(u) 


A re-entry point is a reference instant that finds the page not 


in w(t,t): at such an instant the page re-enters W(t,tT). Observe 


lmis assumption does not contradict the Assumption of locality. 


Locality implies only that the favored pages have short inter- 
reference intervals. 
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pages entering w(t,T7) 
for the first time 


working set W(t,T) pages re-entering 
working set 


ee A(T) = vt rate 


o(t) = real time rate 


pages leaving W(t,T) 
for the last time 


Figure 4-2. Illustrating the meaning of re-entry rates. 


re-entry 
anstant 


Pigure 4-3. 


virtual time 


re-entry 
instant 


Sequence of references to page i. 


for process i 


Z01 
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that th is a re-entry instant if and only if xT, independent 
of other reference instants. Suppose t. is a re-entry; we are 


interested in Ta? the probability 


(4.2.4) m, = Pr[t, is first re-entry after | 

The probabilities Lt oy are distributed geometrically: 
n-1 

(4.2.5) We. (F(T) ) (1-F,(1)) 


That is, th is the first re-entry after t, if and only if each 
of the intervals XpreeesXy satisfies ST, 0 065%,_ 357 and xy 
satisfies XT. The expected number of references until the 


reeentry is 


= 1 
(4.2.6) n - Sin Th = 
{21 i ee 2 


Each interreference interval x is of expected length x, so the 
expected time between re-entries is 

ect he x 
(4.2.7) 1x =-—-_- 
1 - PL(t) 
Let us define the virtual time re-entry rate A, (7) for page i 
in Z(t) to be: 
1 - FL(t) 
(4.2.8) A, (7) SS ——S 

x 

Next, suppose Z(t) contains z pages. Given z, the total re-entry 


rate for Z(t) is: 


Z 1 - FL(t) 
(4.2.9) (A(t)[ 2) = >. A.) = 2 
isl * 
Then, taking the expectation on z, 
1- F(T) 


(4.2.10) A(t) = (Xt)]2)* = 2 ——x— 
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But, since z = x (Theorem 4.1), we obtain finally 


(4.2.11) A(T) = 1 - FLT) 
QED. 


Since FLT) is a non-decreasing function of T, A(T) is a non- 
increasing function of tT. Thus, decreasing tT can never result 


in a decrease in the missing-page probability. 
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4.3. Paging Rate 


Assuming that the memory management mechansim guarantees 
that a page resides in main memory if and only if it is in a work- 
ing set, every re-entry point in virtual time corresponds toa 
page wait in real time. Thus, every page peeeaver ins w(t,T) must 
be recalled from auxiliary memory, and contributes to page traffic. 
Define the paging rate p(t) to be the number of pages per 
unit real time re-entering the working set W(t,t). That is, 


1/p(t) is the expected real time between re-entries. See Figure 4-2. 


Theorem 4.3. Let p(t) be the paging rate. Then 
A(T) 

(4.3.1) et) = 

1+ A(t)T 

where T is the traverse time, and A(t) is the missing-page 


probability. 


Proof: The expected virtual time between re-entries is 1/X(T), 


by Theorem 4.2. Then the expected real time between re-entries 


is 
—i_ _ 1 
(4.3.2) p(T) = X(T) + T 
so that 
ACT) 
p(t) = 
1 +A(t)T 


QED. 


Observe that p(T) may be interpreted as 


number of re-entries 
(4.3.3) (tt) = eOororo—_—_—_— 
elapsed real time 
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because, in a virtual time interval of length V, there are VA(T) 
re-entires; each costs one traverse time T, so the elapsed real 
time must be (V + VA(T)T). 

With a balanced memory (i.e., the totality of working sets 
constituting the balance set B does not exceed memory), process j 
in the balance set B contributes p,(T) to the total returning 


page traffic Y(t): 


(4.3.4) W(t) = ’, p5(t) 
jeB 

so that ¥(t) estimates the total traffic of pages being recalled 
to memory, and is therefore a lower bound on the capacity required 
of the channel bridging the two levels of memory. 

The rate ¥(t) does not include page traffic resulting from: 

1. computations entering and leaving the balance set B; 

2. pages being referenced for the first or last time by 

processes in B (see Figure 4-2). 

Given the rate at which each of these occurs, one can estimate 
the true total paging rate. These adjustments are straightfor- 
ward, so we shall not pursue the matter further. 

We must emphasize that the rates p(T) and Y(t) are estimates 
of steady-state behavior, under the assumptions of Section 4.1. 
The important point is: starting from the interreference dis- 
tribution F(a) and the definition of W(t,t), it is possible to 


estimate these rates. 
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4.4. Working Set Size 
Let Z(t) be a program, and let z(t) be the random variable 
of the size of Z(t). Starting from the assumption that z2(t)=z 
is a stationary stochastic process, we shall derive expressions 
for the mean and variance of working set size w(t,T). The 
° 


importance of these results is that a program’s main memory 


requirement is completely determined by its page interreference 


activity. 
Theorem 4.4. Let w(t) = w(t,t) be the expected working set size. 
Then 
T T 
(4.4.1) w(T) = Jf) du = [jh du 


where A(u) is the missing page probability. 


Proof: Refer to Figure 4-4, where we have shown an interval in 
virtual time for a typical page in W(t,tT). Define the random 


variable 


(4.4.2) S length of the interreference interval 
8 Y = containing the time instant t 


Thus, if we choose a point t at random on the virtual time axis, 
y is the length of the interval in which t lies. The density 
function fyi) for y is not the same as that of the interrefer- 
ence intervals x because, even though long intervals are less 
likely than short intervals, they occupy a larger fraction of 
the virtual time axis. A little thought should convince the 
reader that the probability that t is contained in an interval 
of length u is just the fraction of the time axis occupied by 
intervals of length u: 

u £,(u) 


4.4.3 £ - 
( ) yw = 


a 
t=0 t=y 
virtual 
time 
y 
reference reference 
instant instant 


Figure 4-4. Interreference interval containing time t. 


€TtT 
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For a complete discussion of this property, see Feller [F2,Vol.2, 
p.1l0ff]. Let i denote a typical page in the program Z(t). 
Define the binary random variable 
ah if iew(t,T) 
(4.4.4) Ya. 
O otherwise 


Refer again to Figure 4-4, and use t=0 as the left end of the 


interval. Suppose y=u; then 


Prly,=Olysui = Prfu>t and te(t,u)] 
Prly.+0] = { Prly.=Oly=us f  (u) du 
‘ u>T + Y 
(4.4.5) Prly,=0} - f Prfu>t and te(t,u)] £f. (u) du 
u>T Y 


Now, t may fall randomly on the interval (0O,y), so 


(4.4.6) Pr[u>t and te(t,u)] = a 
then 
(u-T) £.¢u) du 
; ro U-T _ pos x 

(4.4.7) Prly,-0] Se ere ens AG ee eee 
carrying out this integration (by parts) we obtain 

1 ,pt 
(4.4.8) Prly,-0] = l- mena ACu) du 

x 
where ACu)=1-F,(u). Then 

- lopt 
(4.4.9) Prly,=l1] = cme je A(u) du 
x 
Now, observe that 
(4.4.10) wlt,t) = >, Yi 
iez(t) 

Suppose |Z(t)| = 2. ‘Then, given z, the expected working set size is 


Z Zz 
(4.4.11) (w(t)|2) = UCEyT) = wie : Si ely, <1] 2: 4beey, 21 
i=l i=1 
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Then, taking expectation on a, 


Jone NAN a 
(4.4.12) w(t) = Galt)p2y* = ey? atu) du 
x 
Finally, from Theorem 4,1 we have Z = X, so that 
Pan 
WET) oe of ACu) du 


QED. 


We should verify that the properties of Theorem 3.3 are 
satisfied: 

l. w(t) < 1 

2. wl(0) - 0 

3. w(tts) > w(t) s>0 

4. wt) convex. 


Since ACu)=I-F Cus, 


ee cro ias nee ame os 


(4.4.13) wT) J J 
e) = oO 


Aas, = NE 


and properties 1 and 2 are satisfied. Since Alu)>0, 


-Tt+S 


(4.4.14) w(t+s) - j7** ACu) du > ff ACu) du = w(t) 


and property 3 is satisfied. To verify property 4, we show 


that the second derivative of w(t) is non-positive: 


ee w(t) = A(T) = 1 - F(T) 
(4.4.15) 2 
d 
x w(t) = =f CE. eo 0°, since £ (tT) > OQ. 
at - *% _ 


Comparing the theorem statement with eq. 4.1.1, we observe 


also that 


(4.4.16) eo spe 0% 2) 3S 
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And, since z = x, 
2 x2 2 w. (t) 
(4.4.22) we(t,t) = wit) + wea) - SE 
x x 
then 
(4.4.23) a2(t) = welt, t) - welt) = w(t, t) - 
V2 =2 2 
= w(t) +4 = w(t) -~ = 7) 
x x 
- w(t) +* SB) 52x) 
x 


Corollary 4.5. The variance of (1) is lower-bounded by 


(4.4.24) o°( 4) > w(t) € - wad | 


x 


QED. 


Proof: For any random variable x, 3220, so put 02-0 into the 


expression above for of(t). 


Observe, from eq. 4.4.16 and 4.4.17, that 


lim _2 ss a2 
(4.4.25) i pe toe = Oy 


and that 0 (1) attains a maximum value for some 7T>0. 


QED. 
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4.5. Duty Factor 
The duty factor n(t) of a process is the fraction of time 


it is able to spend computing: 


(elapsed virtual time) 


(4.5.1) nt) = 
(elapsed virtual time) + (elapsed page wait time) 


nN(T) measures the ability of a process to use a processor. 


Theorem 4.6. The duty factor 7n(T) is given by 


1 
(4.5.2) nM(T) = 


1 + A(t)T 
where A(T) is the missing-page probability, and T is the 


traverse time. 


Proof: Suppose the process has executed for V vtu, with no 
interruptions other than page waits. The time spent in page wait 


is then (VA(T)T) and so 


Vv 1 
NOT) re OO 
Vi + VACT)T 1 +A(T)T 


QED. 


The duty factor has already appeared in Section 3.6, on thrashing. 
We may interpret n(tT) as the probability that, if we look 


at a process at some random time, we find it running. 
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4.6. t-sensitivity 


Tt is useful to define a sensitivity function s(T) that 
measures how sensitive is the re-entry rate A(t) to changes in T. 


We define the t-sensitivity s(t) of a working set W(t,T) to be 
(4.6.1) s(t) = -S- x(t) = £,(0) 
oo dt x 


That is, if tT is decreased by dt, the resulting increase in 
re-entries to W(t,Tt) is s(t) dt. It is obvious that s(7T)>0: 
reducing Tt can never reduce the page traffic. 

Observe that s(t) is the negative second derivative of w(t), 
and is therefore a measure of the convexity of w(T). 

s(t) may be useful in deciding how small a value of T to 
choose. If f,.(%) has the shape shown in Figure 4-5, curve A, 
a good choice for T is Gah since TOT, has little effect on 
reducing s(v). If £6) has the shape of curve B we should have 
to choose T=TQ>T, in order to have the same T-sensitivity. There 
is good reason to believe that in practice £ (7) is approximately 


hyperexponential, in which case curve A is more representative 


than curve B. 
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s(t) = £.(T) 
x 


Figure 4-5. Using s(t) to choose T. 
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4.7. Choosing T 


Ideally, the working set parameter Tt should be chosen as 
small as possible and yet assure that the working set Ww ft. 7) of 
process p contains p’s favored pages. In principle, then, t should 
be variable, from process to process and from time to time. 

In practice, it will be necessary to choose non-ideal values 
for tT, because the optimum value for tT may be indeterminable, 
or because too much mechanism may be needed either to decide 
on the required value of T or else to vary T dynamically as 
required. Thus, system parameters, as well as program para- 
meters, will play roles in choosing tT. 

Should tT be too small, the favored pages of a process will 
be removed, resulting in high missing-page probability, high 
memory—usage costs, high page traffic, and low efficiency (l.e., 
duty factors). Should Tt be too large, pages may remain in mem- 
ory long after last being used, thus wasting memory and again 
resulting in high memory-usage costs. 

We shall attempt to clarify the nature of the tradeoffs 
among all these factors. 

Strange as it may seem, there may be a worst value for T. 
Suppose the process in question has executed for V vtu. The 
expected number of page waits is VA(T), the expected time spent 


in page wait is VA(T)T, and the expected elapsed real time is 
(4.7.1) Vi+VA(t)T = V (1 + ACT)T) 


During this interval V, the expected working set size is w(t), 


so that the expected cost per unit virtual time is 


w(t) V (1 + ACT)T) 
(4.7.2) H¢t) = 


= w(t)(1 + ACT)T) 
Vv 
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In Pigure 4-6 we have sketched w(t), (1+A(T)T), and the product 
H(t). It is clear that H(t) attains a maximum for some To>0, 

if (1+T)>x. If T>>1 it is not hard to see that t, is very small, 
and the value of T chosen to permit inclusion of at least the 
favored pages will setisty TH>TO + There are values of T satis- 
fying (1+T)<x such that H(t) has no maximum for finite T, in 
which case we need not worry about a worst value of T. 

Note that H(t) has a maximum at T 5» whereas the cost function 
G(s) of Section 3.5.1, as a function of memory space s, has a 
minimum. The apparent discrepancy is resolved if we note that 
H(t) cannot account for a memory holding larger than the expected 
working set size w(t), whereas G(s) can. The functions G(s) 
and H(t) are not the same cost function. 

The remaining tradeoff issues fall into classes: those that 
depend on the behavior of the program, and those that depend on 
system requirements. The program~dependent considerations are: 

1. Hard vs. Soft Programs (cf. Section 3.3, Figure 3-9). 

Choose tT as small as possible, yet allow w(t,t) to 
contain the favored pages. A hard program has a well- 
defined minimum value of T, whereas a soft program does 
not. 

2. t-sensitivity (cf. Section 4.6). T can be chosen so 

that s(t) is at some desired level, or that tT is at 
the start of a flat region of the s(T) curve. 
The system-dependent considerations are: 

1. _Paging rate (cf. Sections 4.2, 4.3). 7 can be chosen 

so that the virtual time between page faults is compar- 


able to T; that is so that 1/A(tT) = T. This is 


123 


1+T 


ef rN (1 + A(T)T) 


Figure 4-6. Cost per unit virtual time H(T). 
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equivalent to the condition 


en ACT) 1 
Tt — ——— = — 
E 1+ X(t)? aT 


2. Duty factor (cf. Section 4.5). Tt can be chosen so that 

a given duty factor (Tt) is attained for each comput- 

ation. 
Of course, T should never be chosen less than whatever is re- 
quired to satisfy the program-dependent criteria, conditions 1 
and 2. In most contemporary systems T is so large (T#10000 vtu 
= 10 ms.) that tT must be chosen to satisy the system-dependent 
criteria, conditions 3 and 4; this will generally cause T to be 
an order of magnitude or more greater than program-dependent 
criteria would require. 

Ideally we should like to have the flexibility to choose T 
according to the program-dependet criteria, without regard to 
the system-dependent criteria. It should be clear that this is 
achievable only when T becomes much smaller than is normal in 
contemporary systems; for example, T less than 100 vtu. The 
use of bulk core storage or some other non-rotating device for 
the second level of memory can achieve this. We shall return 


to these issues in Chapter 8. 
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4.8. Prediction 
On several occasions we have noted that a working set 
w(t,t) is a reliable prediction of the working set W(tt+a,T), if 
a is not too large; similarly the working set size w(t,T) is 
a reliable prediction of the working set size w(tta,T), if a 
is not too large. Without going into great detail we want to 
indicate how these ideas can be made more precise. 
The prediction problem for working set sizes is: 
given: w(t,t) for tel 
want to estimate: w(t+a,t) for agI 
the estimate is to be: (tta,t) = Glw(s,t)) for sel 
Here, I is a set of time points at which the value of w(t,T) is 
known. This set I could consist of one or more distinct points, 
of a time interval (ty,t,), or even the entire time history since 
t=O. The transformation G is to be chosen so that &(t+a,t) is 
an optimum (in terms of a given criterion) estimate of w(t+a,T). 
We assume that w(t,T) is a stationary stochastic process; 


hence we can write its expectation independent of time: 
(4.8.1) w(T) = w(t,T) 


and we can define the autocorrelation function between w(t,T) 


and w(t+u,t) to be 
(4.8.2) R(u,t) = wlt,t) wlt+u,t) 


depending only on the separation u of the two times. 
The most common form of prediction is least mean square 
prediction, used because it is particularly easy to analyze. 


Define the error of the estimate to be 
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thus, 


(4.8.3) e(a) = wltta,tT) - Glwlu,t)) uel 


The problem is to choose the transformation G such that the mean 


square error, e7(a), is minimum. Clearly, the smallest mean 
sqaure error can be obtained if we impose no restrictions on G 
(non-linear mean square prediction). This, however, leads to 
practical and analytic difficulties, so G is usually restricted 


to be a linear operator. When G is a linear operator, the mean 


square error ea) is minimum if and only if the error e(a) is 


orthogonal to all the given data (this result is well known; see, 


for example, reference [Pl, p. 389]); that is, 


(4.8.4) [wlt+a,T)-Giwiu,t))) wiv,t) = O for each vel 


The most convenient linear operator is a linear combination 
of a finite number of data. That is, w(t,t) is known at the 
time instants thoeeestys and the estimate is to be a linear 


combination 


A 
(4.8.5) W(t+xn,T) = A W(t), T)+...tA w(t, T) +A, 1 


The constants AyseeesA must be chosen to satisfy eq. 4.8.4; 


n+l 
that is, so that 


° ° ~ ‘ = 
(4.8.6) Lw(t+a, 7) (Aj@(t), 7) +... +A w(t, T)+AL DD w(t, , 7) 


for i=l,...,n, and 


(4.8.7) Lw(t+a, t)-(Alw(t,, T+... tAwlt  T)4AL I A = 0 


n+l 


If one expands these for each i, one obtains (n+l) equations in 


(n+l) unknowns (the A;) with coefficients of the form 


0 
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(4.8.8) w(t, ,T) w(t, ,7) = R(t -t,,7) 


which follows from eq. 4.8.2. 


We hope to have indicated with this overview how the prob- 
lem of predicting working set sizes might be made more precise, 
and how an error can be determined for a given estimate and time 


separation a. We refer the reader to the literature for further 


detail [Pl, p. 385ff.]. 
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4.9. Example 


It is interesting to examine the results of the previous 
sections in the case of exponentially distributed interreference 


intervals: 
i 
Fo(u) = l-e Bee 
x —_ 
x 


we have, for the major expressions: 


Name symbol Result, exponential case 
missing-page probability ACT) eee 
mean vt interval 1 abt 
between re-entries ACT) 
mean real time 1 Bt 
interval between e a oe 
: p(t) 
re-entries 
} 
duty factor n(T) ——ae 
1 + Te 
z : - -BT 
expected working set size w(T) x(l-e ) 


Now, suppose we have chosen T so that 


then 


J 
I 
1 
an 
5 
ri 


For this choice of T we have 


1L 
p(t) = aii 
a 
n(T) = 5 
wety 2 & 4 
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4.10. Working Sets and Parachors 


Belady has defined a unit of storage allocation, the para- 
chor [B2], which is that amount of information that must be in 
main memory so that a program spends no more than half its time 


in page wait. If we choose Tt so that 


we find that the duty factor is (t) = 


working set size for this value of Tt corresponds to one parachor. 


Hence the expected 


In the exponential case, this is 


w(t) = x (454) 


Allocating one parachor to each program is the same as allocating 
enough space for its expected working set size. The parachor 

is a static unit of allocation, whereas the working set size 
w(t,t) is a dynamic unit of allocation. Our results in Chapter 3 
show that working set strategies should perform better than para- 
chor strategies (a parachor strategy is one that runs a process 
if and only if there is at least one uncommitted parachor of 


main memory). 
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4.11. Implementation of Working Set Memory Management 


According to our definition, W(t,t) is the set of pages 
a process has referenced within the last T vtu of its execution. 
This suggests that memory management can be controlled with hard- 
ware mechanisms, by associating with each page-block of main 
memory a timer. Whenever a page is referenced, its timer is set 
to T and begins to run down; if the timer succeeds in running 
down, a flag is set to mark to page for removal from main memory 
whenever the space is needed. 

Unfortunately matters are not so simple. According to the 
definition of w(t,t), the timers must run down in virtual time. 
Virtual time coincides with real time only when the process is 
running. More precisely, the timer behavior should be as follows, 
for each process state: 

1. xunning. A timer may run down in real time. 

2. page wait. Since the process is temporarily suspended, 
all timers on its working set pages must be stopped, 
else they amy run down and working set pages may be 
removed during a page wait. 

3. ready and blocked. If a process is pre-empted by the 
operating system, or blocks, its timers may continue to 
run down in real time; then, within tT vtu, the memory 
it formerly occupied will be freed. 

We can see that it is the page wait state that gives the trouble. 
Whenever a process enters page wait its page timers must stop 
until the new page is acquired. For other process states, the 
page timers may run in real time. Therefore we shall associate 


with each page=block in main memory the name of the process 


that has most recently referenced it; when the process enters 
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page wait, all its pages can have their timers suspended. 

The following procedure is useful in 4 software as well as 
a hardware implementation, and is therefore potentially applic- 
able in contemporary systems. The procedure we propose here 
samples use bits associated with each page; these use bits may 
be part of a page table entry or part of a hardware register. 
Sampling occurs at intervals of o vtu, oO being called the 
sampling interval, where o=T/K, and K is an integer constant 
chosen to make the sampling intervals as fine as desired (K=2 
or 3 should be sufficient). On the basis of page references 
during each of the last K sampling intervals, the working set 
W(t,Ko) can be determined. 

There is a sequence of use bits Ug Uz reeesUy associated 
with each page. Whenever a reference occurs, l-mu,- At the 
end of each sampling interval, the bit pattern contained in 
Ug Upreee Uy is shifted one position, a 0 enters Uo» and UK 
is discarded: 

UK=-1 "UK 


—_ 
us Uy 


O—Pu 
° 


Then the logical sum U of the use bits 


is U=l if and only if the page in question has been referenced 
during the last K sampling intervals; of all the pages associated 


with a process, those with U=1 constitute its working set W(t,Ko). 
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Figure 4-7 shows how this idea might be implemented in 
hardware. If process j is currently using the page, the n-field 
of the page register contains an identifier to j. The PW bit is 
0 if and only if j is in page wait. The PT-field points to the 
page table entry designating this page. The o-bus is pulsed 
once every oO vtu; these pulses cause a shift in the use bits 
if and only if PW=1 (the process is not in page wait). Whenever 
the logical sum U of use bits becomes 0, a mechanism (not shown) 
may (not must) remove the page from main memory; this mechanism 
will dispatch the page to auxiliary memory (unless it has not 
been modified and there is a spare copy already in auxiliary 
memory), and then (using PT) find the page table entry for this 
page and set the in-core bit OFF. All this is done without 
troubling the operating system. 

This mechanism maintains a count of the working set size 
for each process as follows. Whenever a fresh page of process 
tw is loaded by the operating system (in response to a page fault), 
increment a counter for the process 1. Whenever the logical 
sum U of use bits becomes O for some page marked as belonging 
to process 1, decrement the counter for process T. 

It is interesting to note that T=Ko may be varied if 
desired by varying 0. The operating system thus has control over 
the current value of T. 

This basic scheme can also be realized in software, as 
‘suggested by Figure 4-8. All processes in the running state 
are identified in the running list. Upon entry to the running 
state, process i is assigned some quantum d;- A process cycles 
through the list, receiving a burst o (o is the sampling interval) 


at each pass; the quantity Yi records its time used. There is - 
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TYPICAL PAGE REGISTER 


process. 
name 


page table 
entry pointer 


typical 
page 


page wait 
indicator; PW=1 
if and only if 1 
not in page wait 


reference O-bus, carrying stream of 
pulses, once each o vtu 


MAIN MEMORY 


Figure 4-7. Hardware implementation of memory management. 
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[burst over] 
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[blocked] 
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iiiiede 


page wai 
T 


TTTTI 77 checker Seat 


eet! 
pox 


from ready state 
with quantum Gy 


to 
ready 
state 


to 
blocked 
state 


Figure 4-8. Software implementation of memory management. 
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a special process, the checker; whenever run, the checker looks 
at the page tables of processes run since the last time it was 
run, and performs the use-bit shift discussed above (the use 


bits Us Upre++ 4, are stored in the page tables, so the 7, Pw, 


K 
and PT fields of Figure 4-7 are no longer necessary). 
Associated with each process is a counter We giving its 
current working set size. At each page fault for process i, 
We is increased by one. If the checker observes a page leave 
the working set of process i, Wy is decreased by one. 
It should be clear that, if the length of the running list 
is n, the checker samples page use bits only every no seconds, 


not every oO seconds. 


This implementation is also discussed in reference [D4]. 
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4.12. Summar 


We refined the working set model, deriving expressions for 
the missing-page probability A(t), the paging rate p(t), the 
expected working set size w(t), the variance of the working 
set size o2(t), the duty factor (tT), and the T-sensitivity s(t). 
Each of these depends on the interreference distribution F(u), 
and some of them depend also on the traverse time T. We showed 
how each of these plays a role in selecting a value for T. 

We discussed the problem of prediction, showing general 
methods whereby errors may be determined precisely for a given 
mode of estimation. 

We discussed implementation of working set memory allocation 
strategies, both for hardware and software. 

All this was done in the absence of sharing. In the next 
chapter we investigate the effects of sharing and show quantita- 


tively that great benefits are attainable. 


137 


CHAPTER 5 


Multiprocess Information Sharing 


5.0. Introduction 

In .this chapter we complete the characterization of the 
working set model, by investigating the effects of sharing. 
Intuition already tells us that sharing should produce an all- 
around inprovement. Our purpose here is to give quantitative 
justification to this well known premise. 

Under the existing definition, working sets will overlap 
when their processes share information. This complicates the 
problem of charging for main memory usage of shared information, 
because the number of overlaps among shared working sets (there 
can be as many as (9731) overlaps among n working sets), their 
sizes, and their contents, may be unknown or at best exceedingly 
difficult to determine. A minor modification of the definition 


makes working sets disjoint, thereby relieving these difficulties. 
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We consider a simple conceptual experiment: n independent 
processes referencing n identical programs compared to n inde- 
pendent processes referencing one program. We derive expressions 
for working set size, missing-page probability, paging rate, and 
duty factor. We show that sharing produces improvement in each 
of these quantities. 

The discussion here in this chapter is not intended to 
solve the problems of sharing information. We only hope to shed 
light on the difficulties of the problem and to give insights 


into possible solutions. 
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5.1. Sharing 


5.1.1, General Aspects 


The smallest unit of information that can be shared is, 
from a process’s point of view, a seqment, because the protection 
mechanism operates on a segment level. The smallest unit of in- 
formation that can be shared is, from the system’s point of view, 
a page, because memory allocation is handled on a page level. 

We follow Arden’s suggestions for program structure [Al]. 

If a segment is shared, there will be an entry for it in the 
segment table of each participating process. Each such entry need 
not assign the same name to the segment. Each such entry, how- 
ever, points to the same page table. Thus, each physical segment 
has exactly one page table describing it. 

The problem of charging participants for the use of shared 
information can be handled at two levels: shared information in 
main memory, and shared information not in main memory. 

Ideally, we should like each participant in sharing of in- 
formation which resides in main memory to be charged in accor- 
dance with his degree of participation. Even though this may 
not be easy to implement, an extension (Section 5.1.2) of the 
working set concept can give insights into how this might be done, 
and how an implementation can approximate this ideal. 

When working sets overlap, the existing working-set defin- 
ition leads to the following difficulty. Suppose computation C 
contains two processes, designated 1 and 2, which are sharing 


information. Then 


w(t, 9 Wo(t, 7) 4 @ 
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where @ is the empty set. Then the joint working set for 


computation C is 

Wot, T) = W,(t,7) U W(t, T) 
and the working set size of C is 

W(t, T) < w, (t, 7) + wo(t,T) 


Thus, measuring the individual working set sizes of the compon- 
ent processes of a computation will lead to an overestimate of 
the true joint working set size. When there is much sharing, 
working sets will be very nearly coincident; summing the sizes 
of each process’s working set will grossly overestimate the 
true joint working set size. This can seriously complicate the 
problem of attributing memory-usage charges to the participants. 
In the next section we shall introduce an alternative work- 
ing set definition that facilitates the accounting and billing 
procedures by making working set always disjoint. 
The method most frequently proposed for handling charges 
on the non-main-memory shared information is based on a concept 
of ownership. Each segment is assigned exactly one owner. Any- 
one wishing to use another’s segment must make arrangements to 
do so with the owner. The owner is charged for use of the seg- 
ment, regardless of who is actually using it; he is in turn 
paid royalties by borrowers, these fees fixed to defray those 


expenses charged to him because borrowers have used his segment?, 


lthe owner method of charging for sharing very much resembles 
copyrights. A similar problem is the so-called proprietary 
software problem, in which a firm or user may lease programs 
to other firms or users. 
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One of the chief motivations for this method is simplicity of 
implementation: because an arbitrarily large and unpredictably 
varying number of processes may wish to share a single segment, 
it can become unbelievably complex to keep track of, and attri- 
bute charges to, every participant. If this method is used for 
information shared in main memory, the following inequity will 
result. Two users sharing the same segment both pay the same 
fee to the owner (there is no way to determine in advance how 
much a given user will use it); yet one user may use it spar- 
ingly, the other heavily. Thus, costs of sharing may not be 
distributed fairly. 

It is apparent that, if the owner method is used, it must 
be augmented in order to distribute main memory costs more 


equitably. 


5.1.2. Refinement of The Working Set Definition 


The basic idea we use here is: rather than associate with 
each process the pages it has most recently referenced, we 
associate each page with the process that has most recently 
referenced it. 

Page i belongs to the working set Wa ftet? of process p 


if and only if: 


1. p has referenced i most recently at time s in its 
virtual time interval (t-T,t). 
and 
2. no other process has referenced i in p’s virtual time 
interval (t-s,t). 
Thus, 


the most recent reference to i originated } 


Mey. W < {3 from process p, in p’s vt interval (t-t,t) 
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This definition has two consequences: 
1. Disjoint working sets. A page is in at most one working 
set. Therefore if w(t, T) is the working set size for each 
process p, and if Q is any collection of processes, 


then the size of their joint working set is 


Wg (t, 7) FE >» w(t, 7) 
peQ 


We may therefore compute the memory demand of any 
collection of processes simply by adding their working 
set sizes. 

2. Fair distribution of costs. Suppose processes Pyreeee Py 
have been sharing page i, independently, for some in- 
terval of time, and let ay denote the number of refer- 
ences process P; made to page i. Then, on the average, 


page i spends a fraction 


(5s100) fe 23, 2 


j n,t+.-..tn, 


of its time in the working set Wool haa)s and has con- 

tributed f, to the size of aoe working set. Thus, 

a participant is charged in accordance with his degree 

of participation. 
This last relation, eq. 5.1.1, holds only if Pyoeees PL behave 
independently. If the shared information is modifiable and 
protected by interlocks, then the likelihood of correlation is 
very high. In general, there is no easy way to determine how 
an interlocked, modifiable piece of data will affect whatever 
processes attempt to use it, because of data dependence and 


arbitrary timing. 
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5.1.3. Implementation 


The implementation of Section 4.11 and Figure 4-7 remains 
unchanged. Refer now to Figure 5-l. The mt-field shown there 
in the page register, whose contents designate the process whose 
working set contains the page, now is loaded at each reference 
with the name of the process making the reference. 

To be more precise, an information reference is a pair 
(i,p) where i is the name of the page being referenced and p is 
the name of the process making the reference. The only modifi- 
cation of Figure 4-7 is simply that p is loaded into the n-field 
of the page register as the reference is made. 

If t<T, the following difficulty arises. If the process 
named in the t-field enters page wait, we must be sure that 
another process does not borrow the page and then discard it 
before the n-field process completes its page wait. For example, 
suppose at time t process 1 enters page wait. If pro- 
cess 2 (which is not in page wait during the interval (t,t+T)) 
references a page in process 1’s working set just once in 
the interval (t,t+T-t), the page will exit process 2’s working 
set before process 1 terminates page wait, and will not be 
available for use by process l. 

There is no easy solution to this difficulty. One possib- 
ility is to choose t>T; but if T depends on the rotation time 
of a device, this may result in undesirably large values of T. 
Another possibility (shown in Figure 5-1) is to prevent a change 
in the contents of the t-field when the process named there is 
in page wait; but then other processes may obtain references 


without paying. 
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O-pulse 


page table 
entry pointer 


typical 
page 


PW=1 if and 
only if 1 
not in page 
wait. 


MAIN MEMORY {i,p) 
reference 


Figure 5-1. Implementation of shared memory management. 
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(t,T) 
Pp 


processes memory M 


SYSTEM 1 


memory Ms 
SYSTEM 2 


processes 


Figure 5-2. Experiment to investigate sharing. 


148 


System 1 represents the completely unshared case, and is 


the poorest performance situation. System 2 represents the 


completely shared case, the best performance situation. 


In the following sections, we evaluate the quantities 


described in the following table. 


quantity symbol 


expected working set size w(T,n) 


missing-page probability ACT,n) 


paging rate p(t,n) 


duty factor n(T,n) 


description 


expected number of pages 
in the joint working set 
w(t,T,n) of n processes. 


probability that a pro- 
cess references a page 
not in the joint working 
set W(t,T,n). 


number of pages per unit 
real time re-entering 
W(t,t,n) on behalf of 
one process. 


fraction of time a run- 
ning or page wait pro- 
cess is running, when 
(n-1) others share the 
program Z with it. 


In each case we show that sharing is an improvement. That is, 


for n>1l and T>0O we show that 


AC(T,n) < A(T,1) 

p(t). =. speCrs 1) 

n(t,n) > nt,1) 
and 


\ 
eae < wlt,1) 


This last relation differs slightly from the others for the 


following reason. There may exist a shared page that no one 


references often enough to keep continuously in main memory, 
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but that, together, the n processes reference often enough to 
keep it continuously in main memory. Thus, the joint working 
set will be larger than any single working set: w(T,n) > w(T,1). 
In the shared case, however, each process pays for 4 of the mem- 
ory used, so his expected cost depends on ae By showing 


uitan) < w(T,1) we show that sharing reduces costs. 
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5.3. Shared Working Set Size and Memory Costs 
In Figure 5-2, let W(t,tT,n) denote the joint working set 


of the n processes, and define 
(5.3.1) w(t,n) = expected size of W(t,T,n) 


When n=1, w(T,1) is exactly the expected working set size dis- 
cussed in Section 4.4. We shall obtain an expression for w(T,n) 
and show that the expected memory cost witan} for one process 


is diminished for increased n. 


Theorem 5.1. Let n statistically independent processes be sim- 
ultaneously sharing the same program Z, whose size is fixed 
at z. The interreference distribution FL (u) is the same 


for each process, and is unchanging. Define the integral 


(5.3.2) I(t) = — ft (1-F (u)) du 


x 
Then the expected size of the joint working set of the 


n processes is 


(5.0953) w(t,n) = g [1 = (1 = 1(4))7] 


Discussion: Note that, for n=l, we have 
w(t,1) = 2 [1-(1-1(t))] = 2 I(t) = S5-F,(u)) du 


where we have used z=x from eq. 5.2.1 (and Theorem 4.1). Thus, 
the expression reduces to that of Theorem 4.4, in the unshared 


case@. 
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Proof of Theorem 5.1: We follow an argument similar to that of 
Theorem 4.4, in which we derived the expected working set size 
for one process. Let Wo (t,t) be the working set of process Pj» 


J 
and let a, (t,t) be its size. Define the binary random variable 
j 


1 if page iin Wy (t,T) 


(5.3.4) Ng ee j 
0 otherwise 


Observe that ¥4y71 if and only if P, referenced page i in (t-T,t). 


Then 
65535) w, (t,t) = oY 
J 1eéZ 
and 
(5.3.6) w(t) = @5 (Est) = Y¥, = 2 Prly,=1] 
J iéZ 


Now, from eqs. 4.4.8 and 4.4.9 we have 


Prfy,=0] = 1 - I(t) 
(5.3.7) 
Prly,;=1] = I(7) 
where I(T) stands for 
(5.3.8) I(t) = a oe (1-F,(u)) du = wit) 
x x 


Now, let W(t,t,n) stand for the joint working set of the 
n processes, and w(t,T,n) denote its size. Define the binary 
random variable 
1 if page i in W(t,T,n) 
(5.3.9) 5, = 
0 otherwise 


Observe that 5,=1 if and only if some one of the processes 


Pj>+++5P, has referenced page i in the interval. (t-t,t). Then 
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(5.3.10) enh = > 8, 
LEZ 
and 
(54 Sib) w(t,n) = wlt,tsn) = on = 2 Prloseld 


iceZ 
We must find Pr{6,=1]. Now, the n processes are statistically 


independent; then 


_ - no reference from any of n processes 
Pr[6;=0] 2 Bee page iin the vt interval (ore) | 


pr {| ne reference from one process to s 
page i in the vt interval (t-t,t) 


n 
(rety,-01 | 


(5.3.12) pr[6,-0) - (1 - Ten" 
thus, 
(5.3.13) Pr[5,;-1] = 1- (1 - ca Ge 


putting this into eq. 5.3.11, we obtain 


w(t,n) = 2 f1- (1 = 2(4))") 


QED. 


Define the expected memory usage of one of the n processes 
to be 
w(t n) 


(5.3.14) m(T,n) = ree aa) 


so that m(t,n) measures the expected cost per unit virtual time 


attributed to one of the n processes. 
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Theorem 5.2. Let m(t,n) = wi t.n) be the expected memory usage 


of one of the n processes, where w(tT,n) is given by Theorem 
5.1, and depends only on the interreference distribution 


PL(u). Then for t>0 and n>l, 


(5.3.15) m(tT,n) < m(T,1) 


Discussion: Theorem 5.2 asserts that sharing reduces costs. 

This result is very strong, for it depends only on the arbitrarily 
given distribution Flu). In other words, whenever two or more 
processes are sharing the same program Z: provided that Z re- 
mains fixed, that FL (u) remains fixed, and that the processes 

are run concurrently, the shared expected memory usage costs 

are always less than the unshared memory usage costs. Put 

another way, sharing is always an improvement under the stated 


conditions, regardless of program behavior?. 


lime reader might think there are counterexamples. For example, 
let the n processes share an interlocked section. A process 
tests the interlock: if the interlock is ON, the process creates 
an enormous amount of data; if the interlock is OFF, the process 
turns it ON and works in the interlocked section. It is clear 
that n distinct copies require less memory than one shared copy, 
because in the shared case (n-1) processes will find the inter- 
lock ON, whereas in the unshared case no process finds the 
interlock ON. This violates the assumptions of the theorem, 
because the program size is not fixed in both cases, and because 
the interlock violates the assumption of statistical independence. 
Thus, this is not a counterexample. 
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Proof of Theorem 5 


$2.2: To prove m(T,n) < m(t,1) we must show that 
: w(t,n) ; 
(Se30 16) wb) ety) 
from Theorem 5.1, this is the same as showing 
I= (1 = T(t)” 
(5.3.17) ———_—- < it) all t>0, n>l 
n 
This requires that 
ies Peat) 94 
(5.3.18) i- > 0 
n I(T) 
er equivalentiy that 
he err 
(503.19) 1- > 0 
n(1-(1-I(t))) 
This expression is of the form 
ites 
GSi28%.- 209 te ee Ne) 0. An 8 1ST). <i 
n(€!-A) 
Now, using the fact that 
ie. n=l 
(52:34:21) = Te ateoAot eee OF CA <n 
which follows from A<l, we have 
eee : 
(5.3.22) 1 - Be (dae ay FO 
1-A 


and the inequality is proved. 


QED. 


In Figure 5-2, define My to be the total expected memory 


requirement in system 1, and M., te be the total expected memory 
rs 
requirement in system 2. 


We have the following rather obvious 
corollary to Theorem 


Dees 
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Corollary 5.2. Ms > “ye 


Proof: By Sheorem 5.2, 
M = mnmwtt,l) > wttjn) - Mo 


QED. 


Corollary 5.2 asserts simply that sharing reduces the overall 


memory usage, resulting in more memory for other programs to use. 
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5.4. Missing—Page Probabili 
Define the missing-page probability to be: 


a given process references a missing page, 


(5.4.1) ROR) a bee (n-1) others share the same program 


Thus, A(t,n) is the probability that a process directs a refer- 
ence to a page not in the joint working set W(t,T,n). Using 
reasoning similar to that of Theorem 4.2, we can see that A(T,n) 
may be regarded as the number of pages per unit virtual time 
re-entering the joint working set W(t,T,n), on behalf of one of 
the n processes. That is, the expected virtual time interval 


between the page faults of one process is 1/A(T,n). 


Theorem 5.3. Let A(t,n) be the missing-page probability as 
just defined, and let FL(u) be the interreference dis- 


tribution. Then 


(5.4.2) A(T,1) = 1 SF) 
and 
(5.4.3) ACtyn) = ACT,1)(1 = T¢t))27H 
where 
1 Tv 
(5.4.5) I(t) = —/f- (1-F,(u)) du 
xX oO x 
Proof: From Theorem 3.3, the single-process, unshared missing- 
page probability is A(t) = 1-F, (tT) = A(T,1). To find A(t,n) we 
note 
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i 


A(T,n) at time t and finds it missing 


pr bat process, say p, references ee 
p’s most recent reference interval x to 
the page in question satisfies x>T, and 
the (n-1) other processes made no refer- 
to the page in question during (t-T,t). 


il 


PSs 


Since the processes are statistically independent, this last 


probability becomes 


A(t, n) 


tT 


n-1 
(eetx>73}(eetno reference to page in (t-1,£)1) 


(1-F (4) (4-204) 77 


Mi 


Meier 


where the probability (1-I(t)) is obtained from the arguments 


given in Theorem 5.1. 


QED. 


The next theorem asserts that sharing reduces the missing- 


page probability. 


Theorem 5.4. Under the given assumptions (FL Cu) unchanging, 
processes running concurrently) the missing-page probability 


A(T,n) is reduced by sharing: 


(5.4.7) ACT,nN) < ACT,1) if T>0O, n>l 


Proof: Since (1-I(T)) < 1, if t>O it follows that Catia ve I; 


and thence A(t,n) = A(t,1)(1l-r(4)) 77) < 1, 


QED. 
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Recalling the discussion of Chapter 3, in whicn we showed 
that Lower missing-page probabilities are equivalent to lower 
Memory-usage costs, Theorem 5.4 verities that sharing reduces 
memory-usage costs. Indeed, in many circumstances it will be 


true that 


ACT,n) << AC T,1) 


that is, sharing is a pronounced improvement. 
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5.5. Paging Rate 
Define the real time paging rate to be: 


(5.524) Cay. <2 real time paging rate of one process when 
es ance (n-1) others are sharing the same program 
That is, one process expects to see a real time interval of 


length 1/p(t,n) between page waits. 


Theorem 5.5. Let p(t,n) be the real.time paging rate as de- 


fined above. Then: 


; A(t,n) 
(5.5.2) ot,n) = 
1 + A(t ,n)T 
where A(T,n) is the missing-page probability, defined in 


Theorem 5.3, and T is the traverse time. 


Proof: In a virtual time interval of length V, one process 
generates V information references and expects to encounter 


VA(T,n) page waits. Therefore: 


(number of page waits) VA(T,n) 
p(tyn) = lh a REE 
(virtual time) + (page wait time) Vo + VA(T,n)T 
QED. 


If n=1 we obtain p(t,1) = p(t), must like Theorem 4.3. 
Let us compare the total page traffic in the two cases. 


In Figure 5-2, let the total paging rates be denoted by 


¥, 0%) n p(t,1) 
(5.5.3) 


n p(T,n) 


¥5(7) 


We wish to show that sharing reduces page traffic. 
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Theorem 5.6. Let Y(t) = np(t,l) be the unshared total page 


traffic, and let ¥5(T) = no(t,n) be the shared page traf- 


fC Then 


(5.5.4) ¥5(T) < ¥, (0) 


Proof: We must show p(T,l)-p(t,n) > 0 if n>l. Consider 
ACT, 1) A(T,n) 
PYLE a ———— SS aS 
1+A(tT,1)T 1+ A(t ,n)T 
ACT, 1) = ACT,n) 


(1 + ACT,1)T)(1 + At ,n)T) 


> 0 
where we have used A(1T,1)-A(T,n) > O from Theorem 5.4. 


QED. 
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5.6. Duty Factor 
Define the duty factor to be 


(5.6.1) nlt,n) = (eee of one process when it fal 


' | sharing its program with (n-1) others 
Recall that n(t,n) is the fraction of time a process is in the 
running state as opposed to the page wait state; thus, n(T,n) 


measures the ability of a process to use a processor. 


Theorem 5.7. Let n(t,n) be the duty factor, as defined above. 


Then 
1 


(5.6.2) N(t,n) = 
1+ A(t,n)T 


where A(t,n) is the missing-page probability (Theorem 5.3), 


and T is the traverse time. 


Proof: In a virtual time interval of length V, the process 
encounters VA(T,n) page waits. Then 


(virtual time) 


n(T,n) 


(virtual time) + (page wait time) 


Vv 


V + VA(t,n)T 


QED. 


If n=1, we have n(t,1) = n(t), just like Theorem 4.6. 


Theorem 5.8. Sharing increases the duty factor: 


(5.6.3) n(t,n) > n(Tt,1) if n>1l, t>0 
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Proof: Since A(t,1) > A(t,n) we must have 


1+ A(tT,1)©% > 1 + A(t ,n)T 


hence 


he 
WK 


a 


1 + AC(t,1)T 1+ A(t,n)T 


QEN. 


Again, under many circumstances A(T,1) >> A(T,n) and it is 
not difficult to obtain n(TtT,n)#1, even for small n. Thus, 


sharing can result in markedly increased processing efficiency. 
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5.7. Variable Number of Participants 


It is often the case that the number of participants in 
a sharing problem is not fixed; instead, the number is a random 
variable. The convexity theorem (Theorem 3.1) enables us to 
obtain bounds on w(t,n), m(t,n), A(t,n), e(t,n), and n(t,n) 
when the average value n of n is known but the distribution of 


nis not. These bounds are summarized in the following table. 


quantity symbol convexity (in n) bound 
expected working  w(tT,n) convex w(T,n) < w(t,n) 


set size 


one-process m(t,n) convex m(t,n) < m(t,n) 
memory demand 

(pages ) 

missing-page ACT n) concave AC(T,n) > A(T,N) 
probability - 

paging rate p(T,n) concave p(t,n) > p(t,n) 
duty factor n(t,n) convex n(t,n) < n(T,n) 


In the most general n=process sharing problem, information 
can be in use by any combination of processes, and each possible 
combination will be sharing different subsets of information. 
Suppose Z is the program in use by processes Pyoeee>Pye We can 
partition Z into as many as Pagal blocks, such that exactly some 
subset of Pyseeeo Py is using each block. Each block associated 
with just one process behaves as system 1 in Figure 5-2. Each 
block associated with more than one process behaves as system 2 
in Figure 5-2, having higher per-process efficiency, lower per- 


process memory-usage costs, and lower paging rates. Therefore 
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the net effect across the program Z is better than the situation 
when Pyoeeeo Py share nothing at all. Thus, system 1 represents 
the worst case behavior and system 2 represents the best case 
behavior; any actual system would fall in between these two 


extremes. 
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5.8. Summary 


When working sets are defined so that a page belongs only 
to one working set at a time, namely the one of the process that 
most recently referenced it, memory usage costs tend to be dis- 
tributed among participants in accordance with their degrees of 
participation. Implementation is straightforward. 

Using a simple model with complete sharing we were able to 
obtain strong results that quantitatively verify intuitive ideas: 
providing processes are run concurrently and the interreference 
distribution is unchanging, sharing always improves performance, 
regardless of the particular interreference distribution. In 
many situations the improvements can be very pronounced. 

Processes sharing information must be run concurrently 
(requiring multiple processors) whenever they are not blocked 
because 

1. If run at widely separated intervals, the same infor- 

mation must (unnecessarily) be reloaded. 

2. It is only when references are arriving concurrently 

to shared information that the benefits obtain. 

The results obtained here apply to a collection of n statis- 
tically independent processes, without regard to whether they 
are components of multiprocess computations or single-process 
computations. Thus, it should be clear that multiprocess com- 
putations, by permitting interprocess information sharing, can 
be very efficient, provided that processor-switching time is 
small and there are enough processors to permit the parallel 


operation of many processes. 
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CHAPTER 6 


Demands and Balance 


2.0. Introduction 


We regard a computation, a collection of mutually cooper- 
ating processes and information operating within a common name 
space, as being the fundamental demand-making entity in a com— 
puter system. A computation manifests itself by demanding the 
joint use of processor and memory resources. 

Because we want a computation to operate effectively as a 
unit, we believe it is necessary to allocate resources to a com- 
putation as a unit. We therefore assume that the entities being 
scheduled for service are computations. This is a generalization 
of existing scheduling philosophies, which call for scheduling 
of processes. 

Let C be a computation, with working set Wo(t,T). The 
working set size Walt, T) will be used to define C’s memory demand. 

If, on the one hand, C is a single-process computation, its 


expected running time beyond the present will be used to define 
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its processor demand. If, on the other hand, C is a multiprocess 
computation, the number of active component processes will be 
used to define its processor demand. We make this distinction 
because in multiprocess computations the number, rather than the 
duration, of processes is important, whereas in single~process 
computations the duration of the process is important. 

Each computation will be assigned a system demand consisting 
jointly of its processor and memory demands. Computations re- 
quiring the use of system resources will be segregated: those 
in the standby set temporarily receive no service, whereas those 
in the balance set receive service. The system is balanced when 
the total demand of the balance set matches the available equip- 
ment?. A balance policy is a resource allocation policy that 
regulates membership in the balance set so that the system re- 
mains balanced. 

We shall study ail these concepts in more detail, then 
examine general properties of balance policies, and conclude 


the chapter with a survey of the pertinent literature. 


Recall that the N processors and M pages of main memory cons- 
titute the equipment. Because we may wish to hold some equip-— 
ment in reserve, we assume that constants a and 8 have been given 
(we shall discuss how in Chapter 8), where O<a<l and 0<8<1, and 
we will say that aN processors and 8M main memory pages cons- 
titute the available equipment. 
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6.1. Memory Demand 

We have defined a working set memory management policy to 
be one that permits a process to run if and only if there is 
enough uncommitted space in main memory to accommodate its 
working set. Working set pages fill these uncommitted slots on 
demand. Thus, the working set size w(t,t) is a useful measure 
of memory demand. 

Let C be a computation consisting of processes PyoeeeoPye 
If We Oty) is the working set of process Py» then the working 


J 
set of C is 


n 
(6.1.1) Welt, tT) = U W(t, 
jer 4 


If we use the working set definition given in Section 5.1 
(a page is in the working set of whatever process most recently 


referenced it), working sets will be disjoint, and their sizes 


add: 
n 

(6.1.2) w(t,t) = DS) w (t,7) 

ty a 

We define the memory demand mat) of computation C at time t 

to be: 

; W(t, T) 
(6.1.3) ma(t) = min (:, ™ ) Oo< ma(t) <1 


where M is the number of pages comprising main memory, and 
Walt, T) is the size of C’s working set. 

Clearly, m.(t) represents the fraction of the memory re- 
source demanded by C at time t. If C’s working set Walt, 7) 


contains more than M pages (it exceeds memory) we regard its 


169 


memory demand as being ma (t)=1 because it is demanding the entire 
resource. Presumably M is large enough so that the probability 
Pr(m.(t)=1] (over the ensemble of all computations) is very 
small. 

The definition of memory demand applies to any computation, 


whether it be single-process or multiprocess. 
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6.2. Processor Demand 

We assume that the processor demand of a multiprocess com- 
putation depends on the number of processes, whereas the proces- 
sor demand of a single-process computation depands on the dur- 
ation of the process. 

In contemporary computer systems, the page wait time T 
depends mostly on the rotation time of a device. Because the 
switching time of a processor is relatively much smaller than T, 
it is worthwhile to switch a processor to a second process during 
a page wait of the first. 

To be more precise, let S represent the time required to 
switch a processor from one process to another. S is actually 
the expectation of a random variable composed of electronic 
switching times and scheduling delays. If T>S, it is not eco- 
nomical to dedicate a processor to a process during a page wait, 
whereas, if T<S, it is economical to do so. 

Define the binary random variable ~¢(t) for a given process 
at time t to be: 


1 if a processor is assigned to the pro- 
(6.2.1) m(t) = cess at time t 


0 otherwise 
This quantity a(t) is related to the processor demand of a pro- 
cess. The relationships among process states, memory demand, 
m(t), the traverse time T, and the processor switching time S, 


are summarized in the following table. 
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process state memory demand processor demand 

blocked m(t) = 0 tm(t) = 0 

ready m(t) > 0 w(t) = 0 

running m(t) > 0 m(t) = 1 

ele. ee ea cle tunel 
m(t+T) = m(t) +77 mu) = O if T>S 


We have assumed that the entities to be scheduled (here- 
after called jobs) are computations ~- specific sets of processes 
rather than individual processes. Thus, we assume that alla 
computation’s non-—blocked processes are running (or page sare: 
or else all such processes are ready. We define the states of 
@ computation to be: 

1. enabled: all non-blocked processes are running or 

page wait. 

2. standby: all non-blocked processes are ready. 

3. disabled: all processes are blocked. 

In our work here, only these states are permitted. 

Correspondingly, we define the working set of processes 


P(C,t) of a computation C at time t to be: 


non-blocked PL occen ce if C enabled 
(6.2.2) P(C,t) = in C at time t or standby 
02. F = 
“) if C disabled 


where @ is the empty set. Note that P(C,t) is well-defined 
even if C is a single-process computation. 
Using these ideas, we shall define processor demand in 


both the single-process and multiprocess computation cases. 
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6.2.1. Multiprocess Computations 


In this case, the processor demand is concerned with the 
number of processes in a computation, because the computer sys- 
tem must know how many processing units to assign. 

Let C be a multiprocess computation. We define C’s 


processor demand P(t) at time t to be: 


(652.3) Po(t) = min 0. tee | 0S p(t) <1 


where N is the number of processors, and P(C,t) is the working 
set of processes in C (eq. 6.2.2). 

It is clear that Po (t) represents the fraction of the 
processor resources needed by C at time t. Presumably N is 
large enough so that the probability Pr[p,(t)=1] (over the 
ensemble of all computations) is very small. 

Note the symmetry between the definitions of processor 
and memory demand (eqs. 6.1.3 and 6.2.3), in the case of multi- 


process computations. 


6.2.2. Single-process Computations 


In the case of single-process computations, P(C,t) con- 
tains at most one process; so we must be concerned with its 
duration in order to know how long to assign a processor to it. 
Thus, computer systems in which single-process computations pre- 
dominate (systems such as Multics or IBM System 360) must use 
a somewhat different definition of processor demand. Because 
of this, we are unable to completely preserve the symmetry 


between the definitions of processor and memory demand. 
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We should like to define processor demand in the single- 
process case so that a processor demand has a meaning in the 
time domain analogous to the meaning of a memory demand in the 
space domain. A useful method (by no means the only one) is 
described in the following paragraphs. 

Let the random variable q denote the virtual time interval 
between interactions. It has been found [C4,F4] that the prob- 
ability density function for q, £0), may be modelled by a 


hyperexponential distribution: 


(6.2.4) f£ (u) = cae ?4 + (1-c)be™ 


bu O<ac<b 
O<ec<l 

fjfu) is diagrammed in Figure 6-1; most of the probability is 
concentrated toward small q (i.e., frequently interacting pro- 
cesses), but E(w has a long exponential tail. 

Given that it has been y vtu since the last interaction, 
the conditional density function for the time beyond y until 
the next interaction is 


be (uty) 
(6.2.5) £ 


c 
\ 
e 
IV 
fo} 


aly [2 £ (vw) dav 
Y 4 


which is just that portion of Pg for q>y with its area nor- 


malized to unity. The conditional expectation function Q(y) is 
(602.6) OG) = (Ye an ee. Gay dau 
‘ fo¥ Faty 
G.-ay ,» lec by 
a b 


ce oY 4 (1-c)e7PY 


Q(y) is the expected time beyond y until the next interaction; 


it is illustrated in Figure 6-2. It starts at 


(6.2.7) Q(0) = =+#+ 
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Higure 6-1. Probability density function f,(u). 


Qty} 
G (a } ie AS 5 = = me lt ate ST a a 
) 
Qc) 


Figure 6-2. Conditional expectation function Qly). 
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and rises toward a constant maximum 


(6.2.8) aie = Q(@) = 


2 

a 

Note that, for large y, the conditional expectation Q(y) becomes 
independent of y. 

The conditional expectation function Q(y) is a useful pre- 
diction function -- if a process has consumed y vtu since its 
last interaction, we may expect it to consume an additional 
Q(y) vtu before its next interaction. 

A reasonable choice of quantum to allocate a process might 
be kQ(y) for some suitable constant k>l. It should be clear 
that Q(y) can be measured and updated from time to time by the 
computer system. 

We should point out that this notion -- a conditional 
expectation function to predict processor usage -- is very use- 
ful and quite independent of the hyperexponential distribution 
hypothesis?, We have formulated it in terms of the hyperexponen- 
tial because the hyperexponential is a good model and because 
the hyperexponential has the interesting property that the pre- 
diction function Q(y) becomes independent of y for large y. 

Just as we are unwilling to commit more than M pages of 
memory, so we may be unwilling to commit processor time for more 
than a standard interval A into the future. This interval A 
can be chosen to reflect the maximum tolerable response time to 


a user: for if the set of processes receiving service has total 


Ionat is, other prediction functions might be used. The CTSS 
scheduler [C6,S3], for example, happens to use the prediction 
function Q(y)=y. 
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expected time consumption not exceeding A, then no process in 
this set expects to wait longer than A before its own interaction. 
Just as M is a space constraint, so A is a time constraint. 

We define the processor demand Po (t) of a single-process 
computation C at time t to be: 


AY) Q(0) Q 


(6.2.9) Po(t) = ae — P(t) < ae 


where Yc is the time used by C’s process since its last inter- 
action. 

Since N processors have NA units of time to be committed 
among them, Po (t) is the fraction of this total that C is 
expected to need before its next interaction. Note that this 
definition of processor demand is just the previous definition 
(eq. 6.2.3), with |p(c,t)| = 1, multiplied by the expected 
duration of processor use. It is no, longer symmetric with the 


definition of memory demand (eq. 6.1.3). 


" 
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6.3. System Demand 
We define the system demand d..(t) of a computation C at 


time t to be a pair 
(6.3.1) ae (t) = (Po (t),m.(t)) 


where P(t) and ma (t) are the processor and memory demands of C. 

That the processor demand is P(t) tells us to expect C’s 
immediate processor need to be Np, (t) processors?. That the 
memory demand is ma(t) tells us to expect C’s immediate memory 
need to be Mm. (t) pages. 

This definition applies to C being either a multiprocess 
or a single-process computation. It expresses the dual mani- 
festation of C, as a demand for processors and as a demand for 
memory. Ac (t) must be considered as a two-dimensional random > 


variable, with unknown correlations between P(t) and m(t). 


lig cis a single-process computation, then p(t) tells us to 


expect C to require one processor for NAp (9 vtu. 
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6.4. System Balance 


Let numbers a and 8 be given, where O<a<il and 0<&<1, and 
let D(t) represent the total demand presented by enabled com- 


putations: 
2 2 enabled 
(6.4.1) D(t) = > do (€) _— fe ket 


The computer system is said to be balanced at time t if 


(6.4.2) Bit) = (a,B) 
The system is processor-balanced if 


(6.4.3) > Pelt) = « 
CEB 


The system is memory-balanced if 


(6.4.4) >! mo(t) = 8 
CEB 


That the system is balanced means that the total resource re- 
quirement of enabled computations is simultaneously for aN pro- 
cessors and for BM main memory pages. 

The resource allocation problem is to decide dynamically 
which computations to enable so that balance is maintained. 
This set of enabled computations at time t will be called the 
balance set B. In general, the system will not be balanced at 
each instant of time; instead there will be a sequence of instants, 
called the decision points, at which the demand of B is made to 
return to the desired demand (a,8) by admitting or removing 


computations from the balance set B. 
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One of the chief advantages of balance is simply that the 
balance set B presents (at least at decisions points) a known 


demand; that is, 


Ul 
R 


Pp(t) 
mp(t) 


(6.4.5) t a decision point 


iT} 
wa 


A major design problem, one we shall discuss in Chapter 8, 
is that of determining the balance parameters a and 8. These 
parameters will be chosen so that, just before a decision point t, 


the probabilities 


Pe[ yy Po (t-6) > 


CEB 
(6.4.6) 6 > 0 


ef > m.(t-5) > 


CEB 


Vv Vv 
w w 
G24 | | 


are as small as desired. 

In Figure 6-3 we have diagrammed the flow of jobs (i.e., 
computations) among the states enabled, standby, and disabled. 
New jobs enter the standby set. The scheduler regulates member- 
ship in the balance set so that balance is matintadined.s, Ifa 
computation becomes disabled, it enters the disabled set. These 
points should be noted: 

1. Each job in the standby set has its demand associated 
with it. When a new job enters the standby set, an 
estimate of demand must be associated with it. In the 
absence of reliable predictive information, the best 
estimate is (p,m), where p is the average processor de- 
mand over all computations, and m is the average memory 


demand over all computations. 
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new jobs quitting jobs 


scheduler 


assign 
quantum 


STANDBY SET BALANCE SET 


quantum 
expired 


some all 
process processes 
unblocked blocked 


DISABLED SET 


Figure 6-3. Job flow in balanced computer system. 
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2. In general, when the scheduler admits a new job to the 
balance set, it allocates a quantum q to the job, where 
gq represents the total virtual-time processor consump- 
tion permitted to the job’s processes. If q expires 
(at time t), the job is returned to the standby set, 
with its current demand (p(t),m(t)). 

3. The balance set contains a mixture of running and page 
wait processes, together with their working sets. 
Memory management follows a working set strategy. 

In the case of single-process computations, the following 
terminology (from Multics) is often used. Any process in the 
standby set is said to be ready, and so the standby set may be 
called the ready list. Any process in the disabled is blocked, 
and so the standby set may be called the blocked list. Any 
process in the balance set is either running or page-wait, and 
so the balance set may be called the running list. There is, 
however, a very important difference with Multics: here, a page- 
wait process remains a member of the balance setj; in Multics, 


a page-wait process is regarded as being blocked. 
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6.5. Balance Policies 

A balance policy is a resource allocation policy that keeps 
the computer system balanced. It is implemented by the scheduler 
shown in Figure 6-3, which regulates membership in the balance 


set. Expressed as a minimization problem, a balance policy is: 
(6.5.1) {minimize |D(t)-(a,8)| } 


where |D(t)-(a, 8) | stands for componentwise minimization. 


6.5.1. Demand and Usage Spaces 


To help visualize the operation of a balance policy, it is 


useful to define two spaces: the demand space V and the usage 


Space U. We regard V and U as being two-dimensional: a typical 
point (p,m) in either space is capable of representing the demand 
of some computation. The demand space V contains a set of 
specially designated points, the demand points, one representing 
the demand ¢.(t) of each enabled computation C in the balance set. 
The demand points are time-varying in position. The usage space 
WY contains two specially designated points: the actual demand 
point D(t) and the desired demand point (a,8). A balance policy 
tries to move the actual demand point, along some path, closer 
(in the sense of eq. 6.5.1) to the desired demand point. These 
ideas are illustrated in Figure 6-4. 

Unfortunately we must be careful not to interpret the spaces 
NM and U as metric spaces, because the path D(t) follows when it 
moves toward (a,8) affects system behavior. If these spaces were 
metric spaces we would be able to assign a magnitude, say wld), 


to a demand d, which would in turn imply that system performance 
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Figure 6-4. Demand and Usage spaces. 
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would depend only on the magnitude of the imbalance, p(D(t)-(a,B)). 
The following argument shows that this is not the case. 

Figure 6-5 shows how system performance will depend on the 

path. The paths shown are: 

1. path _l. (first balance memory, then processor.) First 
examine the standby set for a subset of computations 
with memory demand ons second, select from this subset 
a computation with processor demand Oss In the first 
step the whole standby set is examined, so it is highly 
probable a computation with memory demand OF will be 
found. In the second step only a subset of computations 
(those with memory demand Oo? is examined, so it is 
less likely a computation with processor demand O 
will be found, The result is that memory usage is 
tightly distributed about BM, whereas processor usage 
is loosely distributed about aN. 

2. path 2. (first balance processor, then memory.) This 
has exactly the opposite effect as path 1, the memory 
usage being loosely distributed about 8M, the processor 
usage being tightly distributed about aN. 

3. path 3. Balance both processor and memory simultaneouly 
by examining the standby set for a computation whose 
demand is exactly (O,98,,) This has an effect inter- 
mediate between those of path 1 and path 2, the pro- 
and memory usage tending to be equally distributed 
about the desired points. 

Now in itself the path effect need not interfere with performance. 


But in computer systems in which the traverse time is large and 
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Figure 6-5. The path effect. 
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computations are single-process, it is extremely important to 
balance memory properly in order to avoid thrashing, and to 
avoid accumulating too many traverse times from returning pages. 
In such systems path 1 is the best path. In such systems we can 
tolerate higher imbalance in processor usage, because (see 
eq. 6.2.8) the standard interval A is a delay or response con- 
straint and is therefore a value judgment, whereas the memory 
size M is a physical constraint. 

Conversely, in computer systems in which the traverse time 
is not large and computations are multiprocess, path 3 is the 
best path because we are equally concerned with processor and 


memory balance. 


6.5.2. Properties of a Balance Policy 


Although we shall defer detailed discussion about implemen- 
ting balance policies until the next chapter, it is nonetheless 
useful to point out certain properties the implementation should 
or will have, consistent with the objectives of a multiprocess 
computer system. 

First, the balance criterion is not necessarily an equipment 
utilization criterion. If (a,B) are set close to (1,1) then 
certainly equipment is fully utilized. If («,8) are set much 
less than (1,1), the service to users is improved because, as 
we discussed in Chapter 1, there is an inverse tradeoff between 
utilization and service. Therefore a« and B can be regulated by 


the administration to meet its current objectives. 
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Second, balance significantly diminishes the possibility 
of thrashing because, by proper selection of the balance para- 
meters a and 8, the probability that the system actually enters 
an overload condition can be made arbitrarily small. 

Third, we shall see in Chapter 7 that an implementation of 
a balance policy can be made to have the property that the rel- 
ative computational overhead required to restore balance depends 
only on the degree of imbalance (i.e., (65,95, in Figure 6-5) 
and not on the size of the total demand D(t). This guarantees 
that balance is configuration independent in the sense that 
the same basic strategy scales over a very wide range of loads. 


Fourth, the reader will recall that we have assumed fairness 


should be built in to a balance policy. We will say that a policy 
is fair if a job’s waiting time depends only on its order of 
arrival relative to jobs of comparable demand, and not on its 
order of arrival relative to jobs of different demand. Many 
existing scheduling philosophies, which tend to stall long jobs 

in deference to shorter jobs, are not fair by this definition. 

In the next chapter we shall show how to incorporate fairness 

into a balance policy. 

Fifth, balance makes it possible to make jobs independent 
of one another, in the sense that an increase in the demand of 
one will not interfere with the resources in use by another. 

If the balance parameters a and 8 are set less than unity, then 
there will be a slack of (1-a)N processors and (1-8)M memory 
pages to absorb just such demand fluctuations. This paves the 
way to tractable analysis. 

Sixth, a balance policy should ténd to run two processes 
concurrently whenever they are sharing information, in order to 


reap the benefits of sharing. 
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6.6. Survey of the Literature 


Much of the literature is devoted exclusively to problems 
of processor scheduling or to memory allocation; little of it 
is devoted to a unified treatment of both. 

By far the greatest part of the literature addresses itself 
to scheduling; a myriad of algorithms and analyses have appeared. 
Estrin and Kleinrock [El] and then Coffman and Kleinrock [C3] 
have very good surveys of the important algorithms. In refer- 
ence [C3] it is demonstrated that all the algorithms are sus- 
ceptible to countermeasures: because most algorithms favor small 
tasks (both in size and duration) either explicitly or implicitly, 
a user may significantly improve service to himself by subdivid- 
ing his job into a sequence of small tasks (provided no one else 
does this too). This does overall efficiency no good. 

A variety of papers report on memory allocation [B1l,B2,D2, 
D4,P4,R2]; we have already discussed these in Chapter 3. 

Not much reported work deals with interactions between 
processor and memory. The approaches most often used are either 
to regard scheduling as the primary allocation function, memory 
management as the secondary allocation function, or else to re- 
gard them both as being independent. It should be obvious by 
now that in existing systems the problem of memory management 
is of far greater importance than that of scheduling, on account 
of the very large traverse time and the serious possibility of 
thrashing. Therefore, if memory is properly managed, almost 
any reasonable scheduling algorithm will function well. Con- 
versely, if memory is mismanaged or overloaded, the particular 


scheduling algorithm will be of little consequence. 
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No work is reported that gives insight into how scheduling 
problems are compounded by information sharing. For example, 
it is clear that scheduling algorithms should tend to run two 
processes together in time whenever they are sharing information. 
However no study reports on the extent to which a scheduling 
algorithm should tend to do this, or whether it should tend to 
do this explicitly at all. 

There has been some work on system balance. It falls into 
two classes: static balance, the problem of determining an op- 
timum equipment configuration for a given program mix; and dyn- 
amic balance, the problem of dynamically adjusting the load to 
the existing equipment. Nielsen [N1,N2] has reported on simul- 
ation work for static balance which has been of considerable help 
in configuring the Stanford version of IBM System 360. Saltzer 
[S2] describes some rule-of-thumb performance measurements that 
may be used to test a system to decide whether or not it is 
statically balanced or whether it is thrashing. 

The most interesting work of all concerns dynamic balance. 
Oppenheimer and Weizer [02] report that their simulation of the 
RCA Spectra 70/46 Time Sharing Operating System verify conclus- 
ively that even relatively primitive notions of dynamic memory 
balance result in markedly improved performance. O’Neill, 
Belady, and colleagues [01] have been experimenting with a 
load-leveler on the M44/44X computer, and have been very pleased 


with the results. 
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&.7. Summary 


The important concepts introduced here in this chapter all 
center around the idea of supply-and-demand allocation in large 
computer systems. Memory demand is based on working set size. 
Processor demand is based on the intensity or the duration of 
processor requirements. System demand is a composite of these 
two types of demand. 

Computations requiring resources are divided into two classes: 
standby set computations, which are temporarily denied use of 
system resources; and balance set computations, which are granted 
tne use of system resources. The system is balanced just when 
the total demand of the balance set matches the available equipment. 

A balance policy is a resource allocation policy that reg- 
ulates membership in the balance set so that system balance is 
maintained. The demand space and usage space were introduced as 
conceptual aids to understanding properties of balance policies. 
An important property, the path effect, is the dependence of per- 
formance on the order in which the processor and memory resources 
are balanced. 

We distinguished two aspects of balance. The first aspect, 
static balance (controlled by the administration) is the problem 
of matching the equipment configuration to the total demand of 
the user community. We return to this in Chapter 8. The second 
aspect, dynamic balance (controlled by the scheduler) is the 
problem of matching the demand of the balance set to the existing 


equipment. We direct attention to this in the next chapter. 
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CHAPTER 7 


Implementation of Balance Policies 


7.0. Introduction 

The general structure and basic properties of a balance 
policy have been given in Chapter 6. It remains to show how a 
balance policy can be realized. 

The three most important things we are requiring from a 
balance policy are: first, that it keep the system balanced; 
second, that it be fair; and third, that it assure reasonable 
policies with respect to other criteria such as minimum response 
time. 

We distinguish two cases: the one-dimensional case is ap- 
plicable to contemporary computer systems, in which the threat 
of thrashing makes memory balance so much more important than 


processor balance; the two-dimensional case is applicable to 


future computer systems, in which both processor and memory 
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balance will be equally important. In one-dimensional cases, 
we explicitly balance only one resource type and try to achieve 
reasonable balance of the other resource type, whereas in two- 
dimensional cases we explicitly balance both resource types 
simultaneously. 

The most important result of this chapter is: we formulate 
mathematical programming problems whose solutions, found dy- 
namically by the scheduler, are almost-optimum balance policies. 

In Section 7.1 we present an analysis of a single-server, 
first-come, first-served queue, because this can act as a worst 
case analysis for the behavior of the queue structures we pro- 
pose. In Section 7.2 we study properties of queue structures 
that guarantee fair policies, and we find bounds on the processor 
and memory requirements needed to act as servers to the queues. 
In Section 7.3 we formulate the one-dimensional mathematical 
programming problem, and give a simple algorithm that finds the 
optimum solution in the particular case of memory balance. In 
Section 7.4 we formulate the two-dimensional mathematical pro- 


gramming problem, but we do not attempt to give solutions. 
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7.1. Analysis of a Single~Server Queue 


The statistics of a first-come, first-served (FCFS) single 
server queue can be used to obtain the worst case behavior of a 
balanced computer system. 

The queueing system under consideration is shown in Figure 
7-1. Job interarrival times are exponentially distributed with 
mean 4, That is, if {tis the sequence of. instants at which 


jobs arrive, the interarrival times 5 as t are identically 


n “n=l 
distributed according the the density function 


(7.1.1) f. (u) = f(a) = ae “ u>o 


Similarly, the job service times are exponentially distributed, 
with mean =. The rate (a) of job arrivals remains fixed, re- 
gardless of the number of jobs in the system; in other words, 
we regard the source population as being infinite. 

Our use of exponential interarrival and service times, and 
an infinite source population, requires justification. 

We are directing the analysis toward large systems, in which 
a large source population generates the service requests. In 
these systems, exponential interarrival. and service distributions 
are good models for at least two reasons. First, it is well 
known that, when a large population generates service requests, 
the times between arrivals from-the population tend to be ex- 
ponentially distributed, even though the times between arrivals 
from a particular member of the population do not. “The tele- 
phone system, for which it has been found that the ‘interarrival 
and service distributions are very nearly exponential [P3,p.281], 
is an excellent example of this behavior. _ Second, there is 


considerable evidence to indicate that many interarrival and 


infinite 


source —————-—> 


population 


Figure 7-l. 


FCFS queue single server 


Single-server queue. 


[done } 


vol 


195 


service distributions are approximately hyperexponential in the 
case of not too large populations [C4,F4]; these distributions 
have exponential tails. By assuming exponential arrival and ser- 
vice distributions, we are modelling the tails of the actual dis- 
tributions, thereby providing a worst case analysis. 

Furthermore, the exponential case, interesting in its own 
right, can yield insights perhaps not obtainable from protracted 
analysis. 

When the source population is finite and the server is sat- 
urated, jobs pile up in the queue and, there being fewer reques-— 
tors remaining in the source Sopuiaeion: the arrival rate slack- 
ens. Because we are interested in the unsaturated behavior 
of a computer system our use of infinite source populations 
(in which the arrival rate is independent of queue length) is 
not unreasonable. Assuming that balance policies keep the com- 
puter system out of saturation, a job will not be seriously de- 
layed in the queues, and the source population will not be ser- 
iously depleted. Thus, the arrival rate will not significantly 
slacken. Indeed, experience has shown that infinite population 
models approximate finite population systems with surprisingly 
little error, for population sizes as small as 20 [F2,Vol.l, 
p.143ff]. 

Nevertheless, since we are in fact making approximations 
to actual behavior by using these assumptions, we can only 
interpret the results as being system averages. 

When (n-1) jobs are in the queue and 1 job is in service, 
the system is in state n. Let Ths denote the steady state prob- 


ability of state n. 
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Theorem 7.1. In the queueing system described above, the 


steady state probability of state n is 
Cocme) wT = x(l-r) AS: Opi Petes 


a Z 
where r = b° The mean and variance of n are 


ne i, ve 
~ ler 
(7.1.3) 2 2 
On = 2 
(1l-r) 
Proof: In a small time interval dt, the probability of a tran- 
sition from state (n-1) to n is (a dt); in the same interval dt, 


the probability of a transition from n to (n-1) is (b dt). 


Therefore in the steady state we must have 


Tt adt = 7m bdt 
n-1 n 
which means 
T SSS op 
n b ‘n-l 
Letting r = = we have 
T = reg r a 
n - O ~ b 


= nm: ‘e ili 
Giz) = >» T, 2 ea oe rr <d 
n=0 
Since G(1) = 1, we have T, = ise and thus 
T = r“(1-r) 


n 


wnich verifies eq. 7.1.2. The mean and variance of n are given 
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by the usual expressions: 
Te Sat GR! “2 ieee 


Go 2, TY @ BG a Re S = 


" (i-r)? 
which verify eqs. 7.1.3. 


QED. 


Theorem 7.2. Let the random variable y denote the time a job 
waits in the queue described above until it enters ser- 


vice. Then the probability density function =e is 


given by 
bra _ 
D u=0 
(7.1.4) iu) = 
B(peaye (Pau u>0 


and the mean and variance of y are given by 


= a 1 
Y o- bea 
= 
€7.145 2 a 1 
Day as 3 
Y (b-a) 
Proof: Observe that 
0) if n=0 Prly=0] = mT, = 1-r 
(7.1.6) y = = 
ee otherwise 
i= 


Here, Ss is the random variable of service time for one job, 


whose density function satisfies 


£5 (u) = be u>d 
i 
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for each subscript i. Eq. 7.1.6 is obtained by noting: if the 
job arrives to find n already in the system it must wait for all 


n to complete service. 


{u), the density function 


Suppose n>l. We want to find ty ne 
’ 


for y when n are in the system. Observe that 


fy nw) du = Pr’ (n-1) events in u and 1 event in (u,u+du) | 
J 
a Oa ee 
Se e mu (b du) 
so that 
md 
F (Buy >. Jeu 
ane Ap 

a a (m=lt © 

To find Be Snseiee 
oo Cs) ; ea 
u é fu) 7 = b ou) bu,2 (yy) 
y,;n>l- yan n (n-1)! 
n= n-l 
: -b(l-rju 
( b (l-r) 
Le nee rob (l-r) e 
2 “aay —(b-a)u 
tian = pib-a)e 
: a ; 

since r= y- Finally, 

f, 964) ie - 2 at u=0 

? 


Drepping the use of the second subscript, 


y 
ma 
= u=0 


y 


which verifies eq. 7.1.4. 
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The mean and variance of waiting time are 


= _ fe 
y ae £8) du 


~ 2h 
~  b bea 
2 2-2 a 1 
a yoo =- = 
Y , aa 


QED. 
Theorems 7.1 and 7.2 can be used to find bounds on the 
number in the system ani 


on the waiting time. A bound on the 
number n 


in the system can be obtained from 


oC 
CRIA 7) Prf{n>ul = >. mT - gent 
n-utl 
and a 


bound on the waiting time y from 


(7.1.8) 


= {Ff (v) dv = a,—(b-a)u 
u 
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7.2. Organization of the Queues 


The systems of queues described below are embedded in the 
standby set. They are organized so that the scheduler can quickly 
locate jobs of whatever demand it seeks. They are specifically 
intended for use in contemporary computer systems, in which the 
grave danger of thrashing makes it so overridingly important to 
balance memory. They are the queues for use in the one- 
dimensional case. 

The following discussion illustrates how fairness can be 
incorporated into a balance policy. It also establishes bounds 
on the total processor and memory requirements needed to accom- 


modate the balance set. 


7-2-1. An Almost-—Continuous System of Queues 


Figure 7-2 illustrates a very general, one-dimensional 
queueing structure. We assume that jobs (i.e., computations) 
are arriving at random, interarrival times exponential with 

i F . 7 
mean +. Job (working set) sizes are integers s, se[1,sJ],. Sy 


being the size of the largest working set. The job size dis- 


tribtuion (of incoming jobs) is 


(7.2.1) Pr{s=i] = £ (4) 
and 
s 
fe) 
(7.2.2) » £5 (4) = 1 
i=l 


Let Q = {liasivyk, songs} denote the set of queues. A job of 


size k is placed at the end of the «th queue. 
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FCFS queues 


incoming jobs 
size se C1,s)] 


Figure 7-2. Sorting jobs into size classes in standby set. 
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In our work here, we require that a balance policy be fair. 
To accomplish this we have made the individual queues FCFS, and 
we will require that the scheduler keep at least one job from 
each queue in service. Thus, a job’s waiting time will depend 
only on its order of arrival relative to jobs of the same size, 
and not on jobs of different size. 

Associated with the 1c¢h queue is a quantum aye The quantum 
Ty is assumed to be the same for each job in the eh queue, re- 
gardless of its past. 

The scheduler controls membership in the balance set B so 
that balance is maintained; that is, so that the balance set 
demand (Ppm,) is kept within close tolerances of the desired 
(a,8). Because each job is assigned a quantum, its time in B 
is bounded, so the scheduler need only control entries to B 
(cf. Figure 6-3). 

A job (of size j) may exit the balance set B for one of 
three reasons: 

1. Its quantum expired, in which case it is entered at 

the end of the 3°? queue. 

2. It disabled, in which case it enters the disabled set. 

3. It quit. 

In general, a job will fluctuate in size during execution. Thus, 
if it is of gize k at entry to B, it may be of size j#k upon 
exit from B. We assume a condition of statistical equilibrium, 
so that, on the average, a job of size k entering B implies that, 


within q,.» Some job of size k will exit B. 
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th 


The arrival rate to the k queue is 


(7.2.3) a = a £ (ke) 


and, from eq. 7.2.2, 


(7.2.4) > a. = >: a £.(k) =) vel 


keEQ kEQ 


We may assume that — is the mean of an exponential distri- 
k 
bution, because 2 is the mean of an exponential distribution and 


jobs are statistically independent. The service rate for the 


th 


k queue is b and i is the average service time for jobs in 


k 2 
th Pk i 
the k queue. We assume that — is the mean of an exponential 


Dy 
distribution. 

The analysis of Section 7.1 can be used to estimate the 
behavior of each queue when each receives service independently 
of the others and one job at a time is serviced from each. [In 
reality, more than one job from each queue may be in service, 


in which case the analysis of Section 7.1 must be interpreted 


as the worst-case behavior. 
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Theorem 7.3. Suppose B acts as a Single server to each of the 
5S. queues in Figure 7-2. Let ay be the arrival rate to 
the «rh queue, by be the service rate of the ion queue, 


and let r a, /d, Then the expected processor and memory 


Le 


demand of the balance set B are: 


(7.2.5) 


31 
is) 
' 
ea 
ny 
K 
x 


Proof: Using the result of Theorem 7.1, let 


th ; aus i oe _ 
Top = Pr[k subsystem is empty] = 1 - Dy = od re 


th 


where the k subsystem comprises the ee queue and the single 


job from it in service. Define the random variable y,: 


1 at x Poupsystem non-empty: Prly,=1] = LEN ay 
Y = 
s O otherwise Belg 01.05 tgs 
Thus, 
Prly,=1] = Fy 
PECY eOd) Sr. a= ry 


Then the random variable Pz of processor demand is 


Pa DY 


keQ 


and the random variable m 


mM, = > k Vie 


keEQ 


B of memory demand is 
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The expectations are: 


Pa = >, YK Ps 


iT] 
Pak 
af] 
Kh 
mw 


kEQ 
mM, = > ky, = wy k Prly,=1] = k ry 
kEQ kEQ keEQ 


QED. 


There is an interesting special case, in which the running 
time of a job of size k is inversely proportional to the prob- 
ability £,(k): 

i 7 Jn VS ee et 
(7.2.6) (mean running time), = by =: Ek) 
for some constant b. This behavior may in fact occur in some. 


real situations. For example, the quantum q, could be chosen 


to be: 
(7.2.7) q = 2 
Saye k b £,(k) 
In this case, 
a a £_(k) 
nade, oh HE, 2 
(7.2.8) rk = Dy. = b £_(K) = b = r 


That is, r,=r is constant for all the queues. 


k 


Theorem 7.4. Suppose the conditions of Theorem 7.3 and eq. 7.2.7 


hold. Then the expected processor and memory demands of 


the balance set are: 


'O 
iv] 
u 
a 
ie) 
5 


(7.2.9) 


31 

u 
nt 

5 
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Furthermore, when demand is not too high, that is r<<l 


(the queues are sparsely populated), then 
2 
fo} 

<< 5 


n 


(7.2.10) Mp 


Proof: Eqs. 7.2.9 follow directly from eqs. 7.2.5 with r =r. 


k 
If r<<l, then FS O<<Sos and we have 
2 
- _ s,(s,+)) — (sor) (sot) ue ee 
B 2 ~ 2 2 
QED. 


It is interesting to note that, when So is large, the distri- 
bution £ (i) may be approximated by a continuous density func- 
tion f,(u) if we regard the range [1,s,] of job sizes as being 
continuous. In this case we may regard the set of queues, Q, 
as being a continuum of queues, and use the notation Q(u) to 
denote the queue into which jobs of size u are arriving. The 
arrival rate to Q(u) is: 

(7.2.11) a £(u) ue[1,s 0] 


Let db. denote the service rate of size u jobs, and then 


(Fs2e12) ro o= 


By analogy with eqs. 7.2.5, 


Pah = ts r, du 
(7.2.13) 
M, = ia ur, du 


Again, if nyc is constant, we obtain the same results as eqs. 


7.2.9. 
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7.2.2. The Logarithmic Queue 


When the size So of the largest job is large, it may become 
impractical to implement a large network of queues, such as that 
of Figure 7-2. Indeed, when demand is not too high, the prob- 
ability that queue k is non-empty is small; such a queue struc-— 
ture would comprise a mostly-empty set of queues. 

A general approach to the problem of reducing the number 
of empty queues is to establish classes of comparable-size jobs, 
and to sort jobs entering the standby set into a system of FCFS 
queues, one for each class. Suppose the number of classes is 
chosen to be K. Then we must choose K size-intervals (S)_7>5,) 


in order to define the classes S,:" 


k 
. s is the size of some job 
Tete) oe: = {s re the interval (s,_1,8,) 


Figure 7-2 may be regarded as Sy classes with Ss, = {x}, and 
if we choose K<s 3 it is not hard to see that the total expected 
balance set demands (Dp»M,) will be smaller (under the conditions 
of Theorem 7.3), because more work will be allowed to pile up 
in the (smaller number of) queues. 

One method for choosing the boundaries of the classes Si. 


is to make the arrival rate into each class be the same: 


(7.2.15) a, = 2 = ~ a £,(i) 

iés) 
where £ (i) is the probability Prf{s=i]. A much more interesting 
method of sorting, the logarithmic queue, has particularly use- 
ful properties. 


The structure of the logarithmic queue is shown in Figure 7-3. 


Jobs are sorted by size into one of [log, so] FCFS queues (here, 


FCFS queues 


SES) 
a 
a SES) Balance 
Set 
IF B 
incoming jobs 
size s € [l,s | \ 
ses 
— \ nus 
= I’ so] 


Figure 7-3. The logarithmic queue... 
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the notation [x] means the greatest integer i<x). The classes 


are defined to be 


k 2ktl) } 


(7.2.16) s, ={s|se (2%, keQ 


where Q = {1,....k,.++,Clog, s,]} is the set of queues. When 
a job of size s enters the standby set, it is placed at the end 
of queue k = [log, s]. 

With the «eh queue ss associated a pair (ay 18x) 5 oy being 
the time quantum and Sy being the typical size of a queue k job. 


The probability distribution f, (i) for jobs in class S, is 
k 


f (i) 
#203) if ies, 


(7.2.17) fF (i) = ¢ J, 


0 otherwise 


The average job size in class Ss. is: 
(7.2.18) 3. = >, i £, (4) 
ies us 
k 


and we may regard s, as being typical of the jobs in class Sys 


k 
The arrival rate to the «th class is: 


(7.2.19) a = bs a £,(i) and ry a = a 
iéS, kEQ 


Again, + is assumed to be the mean of an exponential distri- 
k 
th 


bution. The service rate for the k class is: 
(7.2420) b= > pt) £, 
ies) 


which is the average over S) of the rate b(i) of each job size 
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iin Sy- Again we make the approximation that 2 is the mean 
k 


of an exponential distribution, so that we may use Theorems 7.1 


and 7.2 to provide upper bounds on queue lengths and waiting times. 


Theorem 7.5. Suppose the logarithmic queue structure described 
above is used, and that one job from each class is in ser- 
vice. Then the processor and memory demands of the balance 
set are bounded by: 


Pp S Clog, sj (single-process computations ) 
(7.2.21) 


Mp < 25, 


where Sy is the size of the largest job. 


Proof: If one single-process computation at a time is in B 
from each class, then at most [log, so] processes can be de- 
manding a processor. In class Sy the largest job is of size 


cok*t_i); then, 


log, Sy log, Sy (log, S,)72 
m, < » (a rai: e ». gore? tet 2 >. ri 
k=l k=l k=0 
or 
: (log, s_)=-2 
nm < 4(2 - 1) < 2s 
° 
QED. 


Thus, a memory of size 285) a logarithmic queue, and 
[logs so] processors are sufficient to guarantee service to 


one job from each class. 
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The advantages of the logarithmic queue are: 

1. Fairness. One job from each class Sk is guaranteed 
service, and jobs from each class are serviced in order 
of arrival. 

2. Ability to scale. The boundaries of the classes Sy are 
invariant to Soo except for the upper boundary of class 


s K = [log, s_]. 


K’ fo) 


3. Small number of classes. Unless Sy is small, it is true 
that [log, Sa << SQ. 

4. Small processor and memory requirements. From Theorem 7.5, 
no more than 286 pages of main memory are needed, and 
no more than [log, so) processors are needed, to accom- 
modate the balance set B. 

5. Flexibility. Suppose an imbalance of size s appears 
in the balance set memory demand, and that queue j 
is the queue in which size s jobs reside. If the 
scheduler finds queue j empty, it may still satisfy 
the imbalance with 2 jobs from queue (j-1), or 4 jobs 


from queue (j-2), etc. 
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7.3. Mathematical Programming Problem, One-Dimensional Case 


We shall formulate mathematical programming problems whose 
solutions are balance policies. 

We are require three things of balance policies: maintenance 
of balance, fairness, and the ability to satisfy other objectives 
such as minimum response time. Maintenance of balance is achieved 
by the constraints in the problems, fairness is achieved by making 
the mathematical programming problem operate in conjunction with 
queue structures of the types discussed in Section 7.2, and other 
objectives may be expressed as objective functions in the pro- 
gramming problems. We leave the particular objective function 
unspecified, the final choice being up to the policy designer. 

In the remainder of this section we formulate the problem, 
review alternatives for the objective function, prove a theorem 
that constrains the choice of quanta, present a solution in 
the case of memory balance with minimum-response-time objective, 
and finally discuss briefly why the formulations cannot lead to 


completely optimum solutions. 


7.3.1. The Problem 

A decision point is a real time instant at which the sched- 
uler is called on the rebalance the system. Suppose that the 
balance set demand is (Pam) at a decision point. Define the 


imbalance in B to be: 


B .B 
(7.3.1) (5,95)) = (a, B)-Cp,,m,) = (a-pp, B-m,) 
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We assume that the scheduler is called on to admit jobs 
to B, never to remove jobs from B, On the one hand, since a 
job’s quantum bounds its time in B, the demand (pam,) must 
eventually fall below (a,8). On the other hand, the para- 
meters a and 8B are chosen to leave (1-a)N processors and (1-8)M 
memory pages available for unanticipated expansions, 

When do the decision points occur? This is basically a 
decision to be made by the policy designer. Possibilities are: 
decision points occur at regular, clocked intervals; they occur 
whenever a job exits the balance set B; or they occur whenever 
the imbalance exceeds some threshold. 


Define the following parameters: 


Q - {1,2,...,K} is the set of job-class indices. 

nyo = number of jobs from class Sy selected by the scheduler 
to enter B. 

ne = number of jobs from class Sy already in B. 

SB, 7 typical size of job in class Sy 

q = quantum assigned to jobs in class Sys 

a,8 = balance parameters. 

N,M = number of processors, number of main memory pages. 

A = standard interval used to define processor demand 


in the single-process case (Section 6.2.2). 


n = minimum tolerable duty factor for each computation. 
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Problem definition. Find integers {n such that the 


khkeo 
objective 
F(njye4+sNy) 


is extremized, and the constraints 


B 
(7.3.2) {n, +n 2 1 ae 
(7.323) > %& Ss <u 62 
® k "k = m 
kEQ 
NA B 
(7.3.4) > na, < SAse 
KeQ 


are satisfied. 


The choice of objective function F(ny,+++ ny) is discussed 
below. The constraint of eq. 7.3.2 means that at least one job 
from each class shall be in service. The constraint of eq. 7.3.3 
asserts that the total memory requirement of jobs admitted to B 
shall not exceed the imbalance of Moe pages of memory. The con- 
straint of eq. 7.3.4 asserts that the total processor require- 
ment of jobs admitted to B shall not exceed the imbalance of 
a processors. We have divided by the duty factor 7 because 


if each job has minimum duty factor n, then N processors may 


appear as ; processors; see Section 7.3.3. 


7-322. The Objective Function 


The objective function F(nj5+++.Ny) which is to be extre- 
mized (i.e., maximized or minimized) is to be specified by the 


policy designer. Some possibilities are, in order of complexity: 
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1. Minimize total waiting time. At a decision point, let 


Ny denote the number of jobs in the ic queue. Then 


the wait of the job at the end of the reads queue (before 


entering service) is (N ny ays The objective becomes 


minimize P(n)y+4+5ny) = » (Ny =n) Gy 


I 
Zz 
= 
ga 
! 
I“ 
po) 
x! 
yr 


but N is a constant at a decision point, so we have 


IK 


the simpler objective 


C743%:5) maximize » my Ay 
kEQ 


We shall use eq. 7.3.5 as the expression of a minimum 
response time objective. 

2. Minimize weighted sum of waiting times. Let Cyasee ely 
be a set of weights (relative importances) of the 
waiting times in each queue. Then (by analogy with 


eq. 7.3.5) the objective becomes 


(7.3.6) maximize Ys c. My dy 
keEQ 


3. Minimize weighted sum of functions of waiting times. 
Let Jprtees Ie be a set of (cost) functions associated 


with the wait of the job at the end of each queue, and 


Cyyerey Cy are weights. Then the objective is 
ame 
(i367) minimize 2: cy g, (N -n,) 
keEQ 


These alternatives are meant only to illustrate possibilities, 


not to exhaust them. 
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7.3.3. Choice of Quanta 
Efficiency requirements place an important constraint on 


the value of the quanta that may be chosen. 


Theorem 7.6. Suppose ups is given, where O<n, <i, and we want the 
duty factor n to satisfy NM. for all jobs, regardless of 
size. Then the quantum must be at least linearly propor- 
tional to the job size. Furthermore, not every value of 


Ns in the interval [0,1] is attainable for a given choice 


of the working set parameter T. 


Proof: Let Si be the size of jobs in the aa class, and q, be 
their quantum. Let A(t) be the missing-page probability (Sections 
4.2 and 5.4). In a virtual time interval of length Tye the 
process encounters ACT) Gy page waits. In addition, at the 
start of the quantum dy» the working set must be demand-paged 
into main memory (assuming it is not already there), requiring 
an additional S, pages waits. Therefore the duty factor across 
the quantum must satisfy: 

A 


(758.3) 7 = — Ht > 


a + A(T) qT + s,T 4 


Solving eq. 7.3.8 for q, we find 


SKNo 


Ge 
0 oe Pees NH ACT)T 
For the given No? in order that oi be finite, we must have 


(7.3.9) Lisp, SND, $5 56 


(Por example, No=0-5 requires A(T)T > 1, and we must have 


217 


AC T)T>>1 if q, is to be reasonably small). Once 1, and Tt have 


been fixed, the quantity 


is fixed, and we have 


which was to be shown. In other words, if A, SC GS,» the actual 
value of 1 (eq. 7.3.8) cannot satisfy N21: 

To show that not every valuc of 7, in the interval [0,1] 
is attainable for a given choice of tT, let tT be given and solve 
eq. 7.3.9 for Ne 


L 


3 
‘\ 


7 1 +A(t)2 


Thus, ue is upper bounded. Compare with the result of Theorem 
4.6, which gives this expression as the steady state duty factor 
when quantum starts and expirations are ignored. 


QED. 


Theorem 7.6 tells us that if we wish to achieve a certain level 
of processing efficiency, we must be willing to associate larger 


quanta with larger jobs. 
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7.3.4. Solution to the Memory Balance Problem 


The most important one-dimensional case is the memory- 
balance case, because it finds application immediately in con- 
temporary computer systems. In this case we place much emphasis 
on balancing memory (to avoid thrashing) and little emphasis on 
balancing processor. If the objective function is to minimize 
response time, there is a rather elegant algorithm to find the 
solution to the mathematical programming problem. 

The linear programming problem presented in Section 7.3.1 
very much resembles a classic problem, the knapsack problem [D0]. 
We are given a collection of objects, each having a certain 
weight and a certain value, and we are to pack them into a knap- 
sack such that a given weight limit is not exceeded and the total 
value of objects packed is maximum. The solution to this prob- 
lem gives insight into the nature of the solution to the memory- 


balance problem. Formally stated, the knapsack problem is: 


The Knapsack Problem. Let Q be a class of object types, Ny be 
the number of objects of type k, Wye be the weight of a type 
k object, and Vy be the value of a type k object. We are 
to find integers ht ece such that the objective 


maxlm1ze ny VE 
keEQ 


is achieved and the constraints 


fo <n, eM beg 
> ny Wy < WwW 
kEQ 


are satisfied, where W is a given positive number. 
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Theorem 7.7. Suppose in the knapsack problem we have ordered 


the elements of Q such that 


Vv Vv Vv 
— > = 2 bee 2 x Poidiese kEQ 
1 2 k 
Vk 
and suppose no two classes have the same We ratio. Then 
k 
the optimum solution is: 
for k+1,2,3,... do: 
choose the largest Nye such that Osn, <Ny and 
ny we < Ww 
kEQ 
Proof: The details can be found in Dantzig [DO], but the idea 
v 
is very simple. The ratios — may be interpreted as the value 
k 


per pound weight of each object. The algorithm merely attempts 
to reach the weight limit wW by packing in all the objects with 


the highest values per unit weight. 


QED, 


In the general case, we would have to assume that the 


ratios satisfy 


Vv. 
Se Guse, Se ere ke KEQ 
Si Cae ae 


~ 
a 


instead of the strictly decreasing situation given in Theorem 7.7. 
This complicates the algorithm. Since we are about to apply 

it to the memory balance problem, which will satisfy a constraint 
similar to that in Theorem 7.7, we shall not concern ourselves 
further with refinements of the algorithm. Additional solution 


methods are found in Dantzig {DO}. 


220 


The Memory Balance Problem, Let Q be a set of queues, as dis- 
cussed earlier. Let Sy be the typical size of jobs of type 
k, q, be the quantum associated with type k jobs, and Ny 
be the number in the th queue at a decision point. We 
are to find integers oe ae such that the objective 
maximize Y Ny Aq 
keQ 


is achieved and these constraints are satisfied: 


{0 = Tye = Nica 


im *% 2 Uyed 


B 


< Mo 


keEQ 
where M is the main memory size, np is the number of type k 
jobs in the balance set B, S, is the size of a type k job, 


and 6e is the memory imbalance of B. 


The objective function, which minimizes response time, is the 
same as eq. 7.3.5. There is one constraint more than in the 
knapsack problem, namely that the balance set B shall contain 
at least one object of each class. There is no explicit proc- 
essor balance condition because we assume a sufficiency of pro- 
cessors. We shall see in Chapter 8 that we can properly match 
processor and memory resources to the given program mix; thus 
we may suppose there are enough processors as long as we abide 
by the memory constraint and do not change the program mix. 

The solution to the memory balance problem is given in the 


next theorem. 
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Theorem 7.8. The solution to the memory balance problem is as 
follows. Let Q = {1,2,...,K} be the set of queue indices. 


Choose the quanta oe to satisfy 


q q q 
= Powter OS = > = 
K 2 1 


B 
Let R = {keQ | np=0} = {k,,k5,++-4|k,>k,>-.->k,}. Then: 


1. for j=1,2,...,r do: 


j 
if >» Ss. > mo® then goto step 3 else n, =1; 
a j 
i=] ' 


2. for k=K,...,2,1 do: choose the largest ny such that 


O< ny S Ny 
; B 
keQ jeR 


3. done. 


Proof: Follows at once by analogy with Theorem 7.7. 


QED. 


In words, Theorem 7.8 says: first satisfy the one-job- 
from-a-class constraint, then keep on admitting the largest 
jobs possible until memory is full. If step 1 fails to admit 
jobs to the balance set, step 2 is bypassed; thus, resources 
are reserved for these jobs, in readiness for the next decision 


point. 
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7.3.5. On the Optimality of the Solutions 

At each decision point the scheduler finds a solution to 
the mathematical programming problem; but this solution is op- 
timum only with respect to the decision point at which it was 
made. 

Put another way, if the scheduler had a complete listing 
of balance set program sizes, together with their completion 
times, it might very well want to make a decision different from 
that which satisfies the mathematical programming problem. A 
decision which appears to satisfy the objective function at time 
ty may turn out to be poorer across an interval (t1,t») than a 
decision which appears not to satisfy the objective function at 
time ti 

Thus, all we can claim about these mathematical programming 
formulations of balance policies is that they produce solutions 
which are optimum (with respect the the given objective function) 
across short time intervals, but not necessarily across long time 
intervals. 

We do not feel that this is a serious difficulty. The main 
function of these policies is to keep the computer system balanced 
under the given criteria of fairness. The objective functions 
incorporated into the mathematical programming problems are there 
to accomplish ancillary objectives, namely those beyond balance 
and fairness. Thus, it is not of major importance that the policy 
is only locally optimum with respect to the objective function. 

Of far greater import is: it is possible, under fair balance 
policies, to establish reasonable policies with respect to cri- 


teria such as minimum response time, without requiring exorbitant 


amounts of processor and memory resources. 
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7.4. Mathematical Programming Problem, Two-Dimensional Case 


We briefly generalize the ideas of the previous section, 
formulating the mathematical programming problem for the two- 
dimensional case, in which it is important to balance both pro- 
cessor and memory equally well. The formulation is very general 
and is presented in the continuous case. Each job in the standby 
set is now a multiprocess computation. 

Consider again the demand space V, illustrated in Figure 7-4, 
where the region Q in the unit square is regarded as being a 
continuous two-dimensional queue. A demand is a point (u,v); 
demands may appear only in the region Q. The demand density 
function £ on! 49 V) is two-dimensional; that is, the probability 
a demand (p,m) falls in a differential region of area (du dv) 
at the point (u,v) is given by (£ om (Us ¥) du dv), and 
(7.4.1) Sf Eom! Ur) du dv = 1 

Q 


~ 


Again, a is the (exponential) arrival rate of demands into 


the standby set. The rate to the queue at the point (u,v) is 
(7.4.2) a(u,v) = a E omt¥sv) 


The rate at which jobs leave the queue at the point (u,v) is 


(7.4.3) blu,v) 

Therefore 

(7.4.4) Pr[queue (u,v) non-empty] = r(u,v) = aluyv) 
b(u,v) 


following Theorem 7.1. Assuming that each queue is treated 


independently, under a FCFS policy, the expected processor demand 
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queue at 
point (u,v) 


A (u,v) 


oy 
incoming jobs =~ 


DEMAND SPACE 


is 


Figure 7-4. Demand space as a continuous two-dimensional queue. 
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of the balance set is 


(7.4.5) Ph = If uru,v) du dv 
Q 


~ 


and the expected memory demand of the balance set is 


(7.4.6) m, = Sf vr(u,v) du dv 
Q 


Cl 


These equations are obtained by noting the expected contribution 


to processor demand from the queue at (u,v) is 
(7.4.7) u ( PrCaueue (u,v) non-empty] } = u r(u,v) 


and the expected contribution to memory demand from the queue 


(u,v) is 

(7.4.8) v (ertqueue (u,v) non-enpty] | = v r(u,v) 

In the special case that r(u,v)=r is constant everywhere in Q: 
(7.4.9) Ph = cs = . 


To set up the mathematical programming problem, we define 
the following quantities: 
number of jobs to be chosen from queue (u,v) at 


a decision point to enter the balance set B. 
Note that n(u,v)>0 is a continuous distribution. 


n(u,v) 


quantum to be allocated to a job from queue (u,v). 
We assume q(u,v) depends only on (u,v). Since 

a job at (u,v). is a multiprocess computation, 
q(u,v) represents the total virtual time alloc-— 
ated, in a pool, to all the processes. in the 
computation. 


q(u,v) 


u = processor demand at queue (u,v). 


memory demand at queue (u,v). 


< 
ul 


waiting time of job at end of queue (u,v) until 
it enters service. 


w(u,v) 
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g(u,v,x) = cost function associated with queue (u,v) 
when the waiting time there is x. 
(05265) = (a, 8)-(p,ym,) is the degree of imbalance. 


The Problem. Let (pam,) be the balance set demand at a dec- 
ision point. We are to find a distribution n(u,v) of jobs 


to enter B, such that the objective 


f 
i 
’ 


f nda,v) glu,v,wlu,v)) du dv 


is minimized, and the constraints 
Jf aCu,yv) ududv < N65 
[fo nCu,v) v du dv < M6 
Q 
are satisfied. 
We shall not attempt to discuss implementation issues here 
as we have done for the one-dimensional programming problem in 
Section 7.3.4. The solution n(u,v) to this problem is compli- 


cated by the path effect, discussed in Section 6.5.1. We leave 


this as an area of future research. 
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7.5. Summary 


The major result of this chapter is: the balance policy to 
be used the scheduler can be expressed as the solution toa 
mathematical programming problem. 

The solutions produced by these formulations are optimum 
with respect to the objective function across short time inter- 
vals but not necessarily across long time intervals. These 
policies are supposed primarily to be equitable balance policies, 
secondarily to insure reasonable policies toward criteria such 
as minimum response time; thus, these policies meet the objectives 
of this thesis work. 

We showed that there exists a simple, elegant algorithm, 
which finds the optimum set of jobs to admit to the balance set 
at a decision point, which is to be used in the memory balance 
case together with a minimum response time objective function. 
This algorithm is applicable in contemporary computer systems, 
when it is important to prevent thrashing. It is based on an 
analogy with the knapsack problem, a classic linear programming 


problem. 
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CHAPTER 8 


Applications to Computer System Organization 


8.0. Introduction 

One aspect of the study of the resource allocation problem 
has been to set up behavior models for computations in order to. 
provide a framework within which we can understand misunderstood 
problems. An equally important aspect of the study is to examine 
how programmers, system designers, and the computer system itself 
might all cooperate in allocating resources. 

We shall discuss three seemingly disparate aspects of com- 
puter system organization. The first, the equipment configuration, 
is the relationship among the program mix, the amount of processor, 
and the amount of memory. The second, equipment pooling, is 
effecting large processor-memory capacity by sharing equipment 


memories, 


at the finest hardware level. The third, multilevel 
is showing how to make better use of memory resources. The 
relation among these three aspects is: each is concerned with 


matching the equipment to the work load. 
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8.1. Toward Better Programming and System Design 


There is every reason to believe that programmers can, 
by careful programming, create programs that run with small, 
compact working sets. They can do this, for example, by design- 
ing algorithms to work locally on information, and by employing 
data structures which induce highly Vocal reference patterns. 
Programmers who cooperate in this way will be rewarded, for 
their working sets will be smaller, memory-usage costs lower, 
and running times shorter. Thus, a first guiding principle, 
for programmers, is to design programs to have small working sets. 

The remaining guiding principles, for system designers, 
are applications of programming generality (Section 1.2) and 
of our results here. 

Perhaps the single most degrading factor in contemporary 
computer systems is the inability to manipulate small quantities 
of information easily. Ideally, the unit of information storage 
and transfer, universally used throughout the entire computer 
system from the highest level of memory to the lowest, should 
be the word. This is simply not feasible in contemporary systems 
on account of the high cost of accessing an item in auxiliary 
storage. The commonly-used compromise is that of paging: each 
page comprises a block of words, the page size being chosen to 
represent a compromise among wasted memory, complexity of record- 
keeping (i.e., page tables, memory usage map), and cost of trans- 
ferring a page into main memory. And yet, the traverse-time cost 
is still so high that paged memory systems have been beseiged 
with poor performance. Thus, a second guiding principle for the 


system designer must be to make it convenient to manipulate small 
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quantities of information [let us aim for the word]. To do this, 
he must take recourse to parallelism in the data channels, and 
in the addressing and accessing mechanisms. 

Our behavior models have verified quantitatively the intui- 
tively obvious fact that sharing of equipment becomes more and 
more successful when there are more and more participants. This 
generates a need for a great number of processors and for a great 
deal of main memory (much of which is wasted in contemporary sys- 
tems because the large traverse time induces so many protracted 
page waits). Cn the one hand, multiprogramming has made it 
possible to effectively share the memory resource among many 
computations. On the other hand, however, it is not yet possible 
to achieve anywhere near complete utilization of processors, 
Instead, a processor is dedicated to a single process, only one 
instruction at a time being executed, and most of the equipment 
(adders, multipliers, etc.) in a processor is idiel, If, instead, 
the individual hardware components of several processors were 
placed in pools (adders, multipliers, etc.), it would be possible 
to overlap a great many operations. By making it accessible 
from pools on demand, the same equipment contained in one modern 
processor could be used to service simultaneously a surprisingly 
large number of processes. Thus, a third guiding principle for 
the system designer is to permit pooling of small hardware units. 

In summary, the computer system designer must at the very 
least be guided by these principles: programming generality, 
small-working-set programs, ability to manipulate small quantities 


of information, and ability to pool small hardware units. 


1 some processors, such as the CDC 6600 and the IBM 360/91, attempt 
to overlap operations by looking ahead a short distance in the 
instruction stream; but the monosequential nature of instruction 
streams makes it difficult to overlap more than a few operations. 
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8.2. The Equipment Configuration 


By the program mix we mean the collection of possible 
computations. By the equipment configuration we mean 
the proper relative choices of the number N of processors 
and the number M of main memory pages to achieve static balance. 
We shall show that, once any two of {program-mix,M,N} are arbi- 
trarily given, the third is determined. 

The following is meant to indicate the kind of procedure 
that may be used to determine the equipment configuration; it 
is not meant to be the only possible approach. 

We assume that all jobs are single-process computations, 
that they are statistically independent, and that their working 


sets do not overlap. 


8.2.1. Choosing the Balance Parameters a and B 


The statistical properties of the program mix are the work- 
ing set size and the duty factor. 

We assume that the working set size w(t,t) is a stationary 
random process (cf. Sections 3.3 and 4.4), and we let T be under- 
stood. Thus, we may write w instead of w(t,t). The mean ® and 
the variance 06 have already been derived in Theorems 4.4 and 4.5. 

The duty factor n depends on the choice of working set 
parameter T, the size w of a job’s working set, a job’s quantum q, 
and the traverse time T, as follows. If a job is assigned a 
quantum q, it generates q information references. The steady 
state missing-page probability is A(t), so the job expects to 
encounter qA(t) page waits due to pages re-entering its working 


set. In addition, its working set must be demand-paged into 
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memory at the start of its quantum, requiring an additional w 
page waits, one for each page. The total expected page wait 


time is (qA(T)+w)T. The duty factor is 


gq 1 
(8.2.1) i; = Ra TE 
q t+ (qA(T)+w)T 1+ (A(t) +Q)T 
Let 
(8.2.2) y = Mt) +% 
q 
so that 
L 
(8.2.3) 1. eo 
1 + yT 


For simplicity in the following discussion we assume that n is 
the same for all jobs (thus, q=Cw for some constant C3 cf. 
Theorem 7.6). 

We assume that, whenever a job in the balance set is not in 
page wait, it is running. In order that this be a good assump- 
tion, there must be sufficient processor resources that the 


probability 


pe{ 2° processor available | 
a process exits page wait 
is arbitrarily small. We shall see shortly that this is the case. 
In this case, we may regard n as the probability that a process 
is running. 

We now define two random variables: Wis the total working 
set size of the balance set and P is the total processor require- 
ment of the balance set. We suppose there are n jobs in the 


balance set. From our discussion in Chapter 7, if K is the num- 


ber of standby set queues, then n must satisfy n>kK. 
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Let ws; be the working set size of the art job in the balance 


set. Then the total working set size W of the balance set is 
n 
(8.2.4) Ww = Sy 0, 
i 
i=l 


Since the jobs are statistically independent and identically 


distributed, we have 


B, = G 
(8.23.52) aS “Leg 2 ae oh 
of _ 3 
w. 7) 
on 
Then also 
W = no 
(8.2.6) 
2 
Ty = AG, 


Define the binary random variable 

A if the ith job is running 
(8.2.7) Ta = 
6) otherwise 
From the discussion above, 
Pr{n,-1] - q 
(8.2.8) 

Pr{mw.=O] = 1-n 
a 

where 7 is the duty factor. Since the jobs are statistically 


independent and identically distributed, 


Te =~ STE ee Ft) 
(8.2.9) _ iS Pl Gg oon 
of ee n(1-n) 
i 


The total processor requirement P of the balance set is 
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n 
(8.2.10) ae > Ts 


i=l 
and 
Po = nt = nn 
(8.2.11) 
of = noe = nyn(1-n) 
Now, let numbers Ey, and Ey be given, with O<ey<t and 
O<EyS1« These numbers Ey and Ey represent the allowable pro- 


cessor and memory overflow probabilities. That is, we want to 


choose M and N such that 


Pr[Ww>M] < Ean 


(8.2.12) 


Pr[P >N] < & 


We must proceed carefully, because M and N are not independent. 
But before proceeding, we must indicate how Pr[W>M] and Pr[P>N] 
might be determined. 

The Central Limit Theorem tells us that the sum of n iden- 
tically distributed, statistically independent random variables 
becomes normally distributed for large ne. We may therefore 
approximate the distributions of W and P by normal distributions 
(these approximations are surprisingly good, even for n as 
small as 10 or 20; see Feller [F2, Vol. 1, p. 168ff]). That is, 


we approximate £(u) and £,(u) by 


1 =. 2 
£ (uv) = = exp!- eae) 
V2n ow 2 OW 
(8.2.13) 
1 s\2 
fp fu) = ——— exp - P| 
Von op 2 op 
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and then 


Pr{w > M] te £,(u) du 
(8.2.14) 


Pr[P > N] 


N 


ee £,(u) du 


Therefore, given Eu and Ey we can find M and N (using standard 


tables for the normal distribution, such as [F2, Vol. l, p. 167]) 


such that 


x, 

28 
th 
Q 
c 
u 
au) 


(8.2.15) 


u 
iu) 


fe £p)(u) du 


and so relate M to €, and N to € It is now a simple matter 


M N° 
to choose M and N. 


Let the memory size M be given. Then choose the largest n 


such that* 


n 
(8.2.16) priw>M] = Prl > a >M] < ey 
i=l 


Using this value of n, find the smallest N such that 


rl; 
(8.2.17) pr[P >N] = Pel) SON] < ey 
i=l 


Ithe normal approximation is not the only way. For example, 


the more powerful Chernoff Bound [W2, p.67£f] shows that, given 
random variables z;20 with common density function £,(u), there 
exists a decreasing function h(A) depending only on fu) such 
that 


n 
pel > z; >naA] < (n(a))” with O<h(A)<1 
{=1 


@In Chapter 7, we required n>K, where K is the number of standby 


set queues. Here we assume that M and N are large enough so that 
the largest values of n satisfying eqs. 8.2.16 and 8.2.17 also 
satisfy n>K. 


236 


* It should be clear from eqs. 8.2.16 and 8.2.17 that, once 


any one of the three quantities {n,M,N} has been arbitrarily 
given, the other two are uniquely determined (all other things 
being equal). Thus, it makes no difference which of {n,M,N} is 
chosen first. 

To choose a and 8, let {n,M,N} be chosen as above, and 


set B such that 


(8.2.18) BM = W = no 
and set a such that 
(8.2.19) aN = P = nh = nn 


The procedure discussed above is a worst-case procedure, 
for the following reason. In a real computer system, the values 
of a& and & so selected are lower bounds on the actual values that 
may be used without violating the probabilities Eu and Eye That 


is, the values of a and B actually used may satisfy 


Se S11 
(8.2.20) 


IE 


<8. <2 


The reason for this is: the scheduler carefully regulates the 
membership of the balance set, dynamically maintaining W within 
close tolerance of 8M and P within close tolerance of aN. The 
procedure just described takes no account of this additional cer- 
tainty, that Wis close to BM and P is close to aN. Thus, the 


actual variance of Wis less than 02 (eq. 8.2.6) and the actual 


W 
; (eq. 8.2.11). There is more free- 


dom to choose larger a and 8. 


variance of P is less than o 
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8.2.2. How Much Resource Slack? 


We refer to the reserve (1-8)M pages of memory as the 
Slack memory, and the reserve (1-a)N processors as the slack 
processor. We want to show that, as M and N are increased and 
Ey and &y are held fixed, that the relative amounts of slack 
resources become negligible. 


Theorem 8.1. Suppose En and €y are given, a~ and B are determined 
according to the procedure above, and we let the number n 
of jobs in the balance set increase without bound (appro- 


priately adjusting M and N to satisfy Eu and Ey) + Then 


awl 


B-1 


Proof: We show that B-»l1, since the proof for a-»l is exactly 
the same. Since 0<B<1l, it is enough to show: 


~>0 


1-8 
“Bo 


Refer to Figure 8-1, where we have plotted memory usage W, showing 
it to be normally distributed with mean BM=nw and standard dev- 
Lation o,-Vno,,. It is well known that, given Eup the probability 
Pr(W>M] depends only on the distance between M and BM. That 


is, there exists a fixed constant b>0 such that 


Pr[woM] = Pr[ 8M+bo,, > M] 
Then, as n-wo 
a8. (iam . My. Swi . Mei. 
n@ @ 6 Vn 
QED. 
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probability 


Figure 8-l. Memory usage. 
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8.2.3. Relations Among Processors, Memory, Traverse Time 


There is an important three-way relationship among the 
number N of processors, the number M of main memory pages, and 
the traverse time T. If any two of these quantities are given, 


the third is determined. 


Theorem 8.2. Define 8M to be the expected amount of memory to 


match ON processors. Then 

(8.2.21) BM = QNO(1l+yT) 
where ® is the expected working set size, T is the traverse 
time, and 


(8.2.22) Yo = A(t) +B = XC) om 


fe) 


has been discussed at eqs. 8.2.2 and 8.2.3. 


Proof: Let {n,M,N} be chosen as discussed in Section 8.2.1. Then 


= 
" 


BM = nw 
n 


Ot 
ll 


oN = nH = — 
1+ yT 


where 1) is the duty factor, given by eq. 8.2.3. Eliminating n 


between these two equations, we have 
n = ON(1+yT) 


so that 


ib 
s 
" 
3 
€ 
i 


aNw(1l+yT) 


QED. 
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Since Theorem 8.2 depends on average value arguments, it 
is only an approximation to the actual behavior. Put another 
way, we may only regard the relation BM=aN@(1+yT) as stating a 
necessary, but not sufficient, condition on the hardware confi- 
guration. However, the discussion in Section 8.2.1 shows that 
we can regard Ey and Ey as confidence levels for this result. 

The relation BM=aNW(1+yT) gives further insight into the 
causes of thrashing (Section 3.6). Recall that large values of 
T make the duty factor (and hence the attainable processing effi- 
ciency) very sensitive to small changes in the missing-page 
probability (here, represented by y). In Figure 8-2 we have 
indicated the behavior of the processor-memory ratio: 
(8.2.23) R = —=> = (1+yT) 

aNw 

for T=1,10,100,1000,and 10000 vtu. It is clear that, when y is 
small and T is large, the slope of Ris quite steep. Small 
fluctuations of y can result in wild fluctuations in R. Thus, 
if R =fa(l+yT) is regarded as representing the desired processor- 


d 


memory ratio, and R= is regarded as representing the actual 
processor=memory ratio, then these fluctuations in y cause 
Rg to be seriously mismatched to R.- 

In Figure 8-3 we have shown that the expected amount 8M 


of memory grows linearly. Indeed, for large values of T, 
(8.2.24) BM & aNor 


Were we to reduce T by an order or two of magnitude, we could 
also reduce the main memory requirement by as much as an order 


or two of magnitude, without sacrificing efficiency. 
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Figure 8-2, Desired processor-memory ratio. 
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Figure 8-3. Relation among processors, memory, traverse time. 
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The relation BM=aNo(1+yT) shows that BM can increase if 
and only if aN increases, all other things being equal. Thus, 
if a N<aN processors are available, then for some Bo<B, 

BoM=a No(l+yT), and (B-B)M main memory pages stand idle (that 
is, they are in no working set). Similarly, if B OM<BM main 
memory pages are available, then for some a <a, BLM=a N@(1+yT) , 
and (oo )N processors stand idle. A shortage in one resource 
type inevitably results in a surplus of the other. 

All our claims that large traverse times degrade performance 
and strain system resources can be substantiated. Fikes, et al. 
[F5] and Lauer [L1] report on their experience with the IBM 360/67 
Time Sharing System at Carnegie University, in which they replaced 
the drum auxiliary store with large (bulk) core storage. For 
their system, the drum traverse time is about 10 times larger 
than the large core storage transfer time. They report that 
throughput was increased by about a factor of 10 when the large 
core replaced the drum. This supports our remarks concerning 
eq. 8.2.24. Since a considerable amount of main memory space 
was reserved for use as buffers for the drum, removing the drum 
made a large quantity of additional memory available. 

Lauer [L1] points out that the equipment rental and main- 
tenance costs were about 10 per cent higher after the large core 
storage was introduced. Since, however, the system capacity is 
effectively 10 times greater with large core storage than without 
it, the increased size of the market (user community) more than 


offsets the increased cost. 


243 


8.3. Pooling 


Equipment pooling can remarkably enhance throughput. To 
verify this, we consider the conceptual experiment shown in 


Figure 8-4, 


Theorem 8.3. Suppose n requestors with identically distributed 
demands seek to use a given type of resource. Compare two 
cases: first, each requestor is given a private supply of 
resource; second, each requestor draws on demand from a 


pooled supply of the resource. In order than the probability 
Tm = Prf{given requestor fails to obtain required resource] 


be the same in both cases, at least Vn times as much resource 
is needed to provide private supplies as is needed to pro- 


vide a pooled supply. 


Proof: Each of the n requestors requests a random variable Vicy. 
of the resource, independently of the others. For simplicity 


assume y=0. Then 


oe = y? -~y7* = y* a u* £ (u) du 
ly] 20 ¥ 
but 
2 2 2 
fou £ Cu) du > fue £ (u) du > R° Sf £ (Cu) du 
iyi 20 Ivi2>R OY ly| >R 
thus, 
(8.3.1) o2 > R°prl yl > R] 
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Paris Requestors 


CASE 1. Private supplies of resource. 


CASE 2. Pooled resource. 


Figure 8-4. Pooled vs. Private resource supplies. 
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If 
n 
x= Dw 
i=l 
then 
Oty 
Oy = no 
and eq. 8.3.1 becomes 
no® = 
(8.3.2) Pr[y>R] < —* 
R 


Let 6°>0 be given. In Case 1 (Figure 8-4), the probability 17 


of the theorem statement becomes, using eq. 8.3.1, 


2 
a S 
nm = Pr[jyi>R, J] < ~~ = « 
ay el 2 
R 
1 
or, 
ey 
Ry ee = 
and 
no 
(total resource in Case 1) = nRy = = 


In Case 2, suppose (n-l) requestors have made their requests, 
and then the an request arrives, his request being yeue Then 
he fails to obtain his request just when 


n-l 


Therefore the probability t of the theorem statement becomes 


n= f (ef S" ly, | s ar~ [al] ) ftw) du 
i=1 
a) (ofS, bl to > ar, |) £ (a) du 
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- mf D' dyad tlyal 2 982] 
i=l 
n mt o? ‘ 
é P| >. lys| > ak, | <——s * € 
’ ial (nR,) 
from eq. 8.3.2. Then 
Ge oe, aly 
2 Vn & 
and 
o 
(total resource in Case 2) = nR, = Vn = 
Finally, o 
; n—_ 
(total resource in Case 1) E 
ee ts = Vn 
(total resource in Case 2) va = 
QED. 


It is therefore quite clear that sharing and pooling can 
significantly increase the usable capacity of a given amount 
of equipment, especially when n is large. 

It is one matter to realize that pooling at the finest 
level of detail is beneficial, but it is quite another matter 
to implement it. For pooling to operate without unnecessary loss 
of speed, it is necessary to dispense with a centrally clocked 
computer system and to rely wholly on asynchronous logic. lLuconi 
{L2] has studied some of the rather delicate issues attending 


this problem. 
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Existing techniques make pooling of memory resources pos-— 
sible, but they have not yet made pooling of processing hardware 
possible at an equivalent level of detail -- many programs can 
reside in one memory unit, but only one process can use a pro- 
cessing unit at a time. 

The problem of pooling the memory resources can be very 
effectively solved by paging. The smaller the page size, the 
better the pooling. Unfortunately the long traverse times that 
predominate in contemporary computer systems make it just as 
expensive to move a small page as a large one. These systems, 
therefore, have been forced into using large page sizes and have 
not always performed as well as expected. Since physical limit-— 
ations make it impossible to reduce access times of rotating 
storage devices to the required levels, we must turn to non- 
rotating storage devices and rely increasingly on parallel data 
channels and asynchronous logic in order to effect completely 
successful memory pooling. 

Therefore, the potential for effective pooling of memory 
resources already exists in contemporary computer systems. 

It is not the case in contemporary systems that a potential 
exists for achieving the degree of processor pooling needed. 
There are three reasons for this. 

The first reason is complexity of interconnection. In 
order to satisfy objectives of reliability, expandability, and 
programming generality, it has been standard practice to allow 
each of the (say) n processors free and unrestricted access to 
each of the (say) m main memory modules, as indicated in Figure 8-5. 
It is not hard to see that the complexity of the interconnection 


grows exponentially as (mn), whereas the processor-memory capacity 
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processors memories 


Figure 8-5. Full interconnection of processors and memories. 
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grows linearly as (mtn). Indeed, realizing the required large 
number of processors and memory units may not, using full inter- 
connection, be at all feasible. Pooling will have to occur at 

a much finer level. 

The second reason is the large amount of information needed 
to specify a computation. In Multics, for example, a myriad of 
tables and lists are needed in order to completely specify a 
process’s name space and to allow it to be interrupted at ar- 
bitrary times and yet be properly restarted. These tables have 
two deleterious effects. First, it is expensive to switch a 
processor between processes, partly because of all the infor- 
mation that must be loaded into the processor registers, partly 
because of operating system scheduling functions. Second, the 
tables that must be loaded into main memory while a process is 
active occupy considerable space and reduce the memory space 
available to a program’s working set. Unless all this software 
complexity is rooted out, it will remain impractical to implement 
pooling of hardware at a fine level. At the end of this chapter 
we shall disucss a highly organized name-space information struc-— 
ture that may one day produce a solution to these problems. 

The third reason is lack of parallelism in the hardware. 
Processor pooling implies considerable process activity, which 
in turn implies considerable information movement. Parallelism 
on the data channels between levels of memory and in the addres- 
sing hardware is needed if the memory system is to be capable of 
handling the information flows induced by busy. processor hardware. 

Therefore, although there is much work to be done on memory 
system organization and the structuring information, there is 
even more work to be done on basic hardware design so that the 


required degree of processing can be achieved. 


aes 
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8.4. Multilevel Memory Systems 


A memory hierarchy, or multilevel memory, is a sequence of 
increasingly capacious and successively slower-access memory 
devices. Its general organization is shown in Figure 8-6. 

There are n main levels, Mjyeee5Mis and m auxiliary levels, 
AjoeeesAre Each main level device may be addressed directly by 
a processor. Information residing in auxiliary devices must be 
moved into a main level (namely, M,) before it can be referenced. 
We assume that information can migrate only between adjacent 
levels. It is also possible that each auxiliary device (such 
as B) feeds directly into Ma? rather than into another auxiliary 
device. 

By splitting main memory into several levels, we intend 
to model computer systems using large core storage in addition 
to the high speed execution store. In these systems, we would 
have n=2; for generality, we allow arbitrary n. The auxiliary 
devices may be drums, disks, tapes, etc. 

Each main level device M. has an access time ays representing 


the time required to reference one word in level My The access 


times satisfy 


We take the access time ay of the fastest memory M, to be one 


virtual time unit (vtu). 


Define Th to be the traverse time from device i to device j. 


Here, 
j-1 
se >. Te K+] i<j 
k=1 
We assume T.. = T.., and that 
ij Bee 
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processor-memory 
channel 


‘ processors 


Figure &-6, Organization of multilevel memory. 


252 


E Sroavere OS E <T See SIP 
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These traverse times include queue delays, mechanical position- 


ing times (if any), access times, and page transmission times. 


The traverse times to auxiliary devices usually depend on rot- 
ation times, and so queue delays and transmission times are 
usually negligible components in them. The traverse times be- 
tween main levels are composed mostly of transmission times, 
since access times are small. Typical traverse times, using 


1 vtu = 1 microsecond, are: 


type of device access time traverse time (page=1K words) 


thin film Ot vtu 100 vtu (0.1 ms.) 
high speed core 1 vtu 1000 vtu Cl ms.) 
slow speed core 8 vtu 8000 vtu (8 ms.) 
high speed drum 10% vtu 104 vtu (10 ms.) 
moving-arm disk io? vtu 10> vtu (100 ms.) 


When the traverse times depend on the rotation time of 
a device, we assume that shortest-access-time scheduling tech- 
niques, known to be optimum [C2,D3], are used. We may thus 
assume that each such traverse time is as small as physically 
possible. 

The cost of storing one word for one unit of time is less 
at lower levels, M, being the most expensive and An least expen- 
sive. The total storage capacity is assumed sufficient for 
system needs. 


The combined capacity of the main levels should certainly 


be sufficient to contain the balance set. But, in order that 
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lower traverse times may be effective, we strongly recommend 
that the main levels be sufficiently capacious that standby set 
jobs may also have their working sets present in the main levels. 
Thus, a job re-entering the balance set need not experience 
paging delays at the start of its quantum, and much higher 
processing efficiency is possible. 

We assume information moves upward only on demand, and 
downward as it falls out of use. 

We assume that the unit of storage in the main levels is 
the page, and the page size is the same in each main level. The 
unit of transfer between main levels is the page. We assume the 
unit of storage in the auxiliary levels is the segment. The unit 
of transfer between auxiliary levels, and between M, and Ad is 
also the segment. Since information must reside in a main level 
to be addressable, a reference to information in an auxiliary 
level must always involve the transfer of a segment into M, 
before the reference can be completed. 

The basic strategy we adopt for managing multilevel memories 
is to place information at whatever level results in the least 
memory-usage cost (space-time product). 

There are three questions we must answer: 

1. How are the main levels to be managed? 

2. How are the auxiliary levels to be managed? 

3. What is the role of pre-paging? 

We shall use notions of locality and notions of cost to develop 


guidelines for strategies in each of these areas, 
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8.4.1. Managing the Main Levels 


In a multilevel memory, a working set allocation policy 
guarantees a computation the use of processors if and only if 
there is enough uncommitted space among the main levels to con- 
tain its working set. Thus, if wW(t,t) is the working set of 


some computation, it resides somewhere in 


M, UM,U ---UM, 


We must refine the working-set definition in order to decide 
at which level each page of W(t,t) shall reside. 

It should be apparent that we should use a value for T that 
permits most of a program to reside in the main levels, because 
space is more abundant than in a single-main-level system, and 
because we want to assure higher processing efficiency. For 
example, given €, we can choose T such that the missing-page 


probability satisfies 
(8.4.1) A(T) = 1-F,(t) < 6g 


where FLT) is the interreference distribution. 

Let Z be a program. For each page i in Z we define the 
reference density p; (t,T) at time t to be 
number of references to i in (t-T,t) 
(8.4.2) p, (t,T) = Se eg eee ge 


Then the working set of the computation using Z is 


(8.4.3) w(t,t) = {iez |p,(t,2) > of 
Let or Ope 04 9, be a set of thresholds, where 
(8.4.4) 1 = 8, > eo, > 222 > oe, = 0 
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Then we partition W(t,T) into n subsets, where 


(8.4.5) we(t,t) = fiez]o,_, > 9,(t.7) > o,f 
n 

(8.4.6) w(t,t) = U w(t, Tt) 
k=1 


and w*(t, 7) is the set of pages to reside in level My« 

This definition is based on a refinement of locality, the 
concept that, during any execution interval, a process favors 
some of its pages. The reference density p; (t,T) measures the 
degree to which page i is being favored. We assume that the set 
of favored pages (measured by W(t,T)) is not likely to change 
abruptly. In addition, we assume that the reference densities 
p, (t,7) are not likely to change abruptly. 

The capacity of each main level can be determined by 
suitable generalizations of the procedures already discussed 
in Section 8.1. 


The thresholds 6, represent tradeoffs between the cost of 


k 
not having a page in level M. and running more slowly, versus 
having a page in level M,.. running more quickly, and paying the 
overhead of moving the page. 
One method for setting the thresholds ey is as follows. 

Let q be the average quantum, over all jobs, and let S be the 
page size. We wish to decide whether to move page i from level 
M, into M._j- 


during q is: 


If page i is moved, the saving in running time 


qa, -a,_))p; (t,7) 


where ay is the access time to level My The time required to 
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move the page is 


Sa, 


since the page transfer proceeds at the slower of the two 


access times ay and a The page should be moved if 


k-1° 


(8.4.7) q(a,-a,_1)p; (t,7) > Sa, 


That is, whenever 
Say 
(8.4.8) p;(t,T) > 9 = 


k - 
q(a,-a,_}) 


Hardware not presently commercially available would be 
needed to implement automatic memory management using these 
ideas?. Whenever the reference density of a page in level M. 


exceeds 8.41? the page is moved into M Whenever the re- 


k-1" 


ference density of a page in level My falls below Or» the 


page is a candidate for removal to M The least recently 


k+1° 


used non-working set pages in M, are candidates for removal to Ai: 


lor example, we could associate a T-bit shift register with 
each page-block of main memory. The bit pattern in the re- 
gister is shifted once every time unit. A 1 is entered into 
the register if the page is referenced, 0 otherwise. The 
number of 1’s contained in the register can be used as a mea- 
sure of the reference density. 
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8.4.2. Managing the Auxiliary Levels 


Because the high access times make it expensive to re- 
ference information stored in auxiliary levels, we assume the 
the unit of information storage and transfer among the auxiliary 
levels is the segment. Moreover, an entire segment is moved 


from A, into M, whenever one of its pages is referenced. 


di 

The best strategy for managing auxiliary levels is the 
least-recently-used strategy. As a segment falls out of use, 
it finds its way into the lowest levels. A segment is moved 
upward only when it is referenced. 

The reason this strategy is best follows from a locality 
concept, though not exactly the same concept we have been using 
for program behavior. The locality concept of interest here 
is locality in people’s behavior and actions. The longer it has 
been since a person used a certain segment, the more likely it 
is that he has forgotten about it or that he no longer cares 


about it, and so the less likely it is that the segment is of 


immediate use to him. 
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8.4.3. What About Pre-Paging? 


When, if at all, is it worthwhile to load a job’s infor- 
mation into main storage prior to its execution? 

The chief argument for pre-paging is as follows [Ll]. 
Suppose it requires a traverse time T to acquire a page from a 
drum auxiliary memory, and that we wish to demand-page an n-page 
working set into memory. What is the space-time product (cost) 
of this operation? For k=l,...,n, paging in the «th page results 
in k pages standing idle in memory, at a cost of kT. The total 


cost is 


n 
(8.4.9) \ kT = Distt y 
kel 


On the other hand, by careful drum management, it is possible 

to write out the n-page working set as a contiguous block and 
read it back in as a contiguous block, the readin operation re- 
quiring about one traverse time T (since the page transmission 
times are so much less than the rotation time). The cost of 

the operation is nT, since n pages of memory must be reserved 
before the paging operation can begin. Let C denote the ad- 
ditional cost of identifying working set pages (so that they can 
be paged out as a block) and carrying out the page-out operation. 


Then pre-paging is better if 


n(n+1) 


9 T 


(8.4.10) nT +c < 


It is usually possible to make the cost C small enough to satisfy 
eq. 8.4.10. Apparently, then, pre-paging is worthwhile when 


used to obtain information from a rotating device. 
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If the information to be pre-paged resides in a large 
core storage, where the traverse time depends only on page trans- 
mission time, pre-paging is not worthwhile. With rotating de- 
vices, it takes about as much time to move a block of n pages 
as it does to move one page, whereas with non-rotating devices 
it takes one traverse time to move each page. There is clearly 
no gain from pre-paging information stored in a non-rotating 
device. 

Nevertheless, we do not believe pre-paging is worthwhile 
in the multiprocess computer systems we have described. The 
argument given to derive eq. 8.4.10 depended on there being 
no sharing of information. If a working set is to be paged 
out in a block, we must exclude the shared pages from this oper- 
ation. But then, when paging the working set back in, these 
shared pages may not be available, and additional effort is 
needed to locate them. The costs C (eq. 8.4.10) of identifying 
pages, of careful drum management, of handling the page-out 
operation, and of recovering the missing shared pages, can easily 
outweigh the potential savings. Other arguments agains pre- 
paging have been presented in Section 3.4. 

We do not, therefore, subscribe to pre-paging working set 
pages in multiprocess computer systems, unless no sharing is 
possible. Furthermore, if the multilevel memory system of 
Figure 8-6 is used, it is unlikely that a working set of a 
standby set computation will leave the main levels, so there is 


no need for pre-paging. 
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We do, however, feel that it is possible to anticipate 
that an item will be referenced even before it is in a working 
set, and begin moving it into higher levels beforehand. This 
requires a new concept of information structures, which we 


discuss in the next section. 
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8.5. The Environment Graph Information Structure 

Recent work by Dennis [D10] on the design of highly para- 
llel computer systems has produced interesting concepts which 
can greatly simplify the solution to the resource allocation 


problem. The most important concept is that of the environment 


graph information structure. 


A naming scheme is a set of rules for relating occurrences 


of identifiers to items represented in the computer’s memory 
system. We assume here that the same naming scheme is used 
throughout the entire memory system from the lowest level to 
the highest, and throughout the entire execution of a process 
from its first reference to the last. This means that all 
references can be handled within the hardware, and in particular 
that levels of memory can communicate directly with one another 
without having to consult an operating system procedure?, 


The environment graph is a generalization of the file.dir- 


ectory structure [Dl] to a level of detail so fine that every. 


word has a named position. The environment graph is a directed 


Ivultics-like systems do not have a uniform naming scheme. All 
iuser information is embedded in a system-wide file directory 
structure [Dl]. A program makes its first reference to a seg- 
ment by means of a tree name, which may incur many costly refer- 
ences to little-used file directories stored in auxiliary 
levels before.the desired segment is located. Once located, 

a segment is assigned a segment number; subsequent references’ 
take place using segment numbers and are handled automatically 
by hardware. The dreadful inefficiency of referencing infor- 
mation buried in the file directory structure makes it neces- 
sary to have a second, more efficient naming scheme that 
streamlines later information references. 


262 


acyclic graph having a single root node from which there is at 
least one directed path to every other node in the structure. 
Every node has a label. Figure 8-7 shows an example of an (un- 
labelled) environment graph. All paths are assumed to be directed 
downward. 


If v, and Vy are nodes, and there is a directed path from 


1 
vy to Vo then Vo is a descendent of Vie A subgraph is any node 
v together with all-its descendents; subgraphs represent data 
structures, such as files, arrays, procedures, etc. Figure 8-8 


shows how the linear sequence of instructions 
Vo= (Vy 2Voreee2V,) 


would be represented. The leaf nodes (those with no descendents) 
represent actual data values, whereas internal nodes represent 
named information structures (an internal node is always inter- 
preted as the root of a subgraph). 

A data value or structure is identified by selecting a path 
to it from the root node. A special data type, the pointer, may 
designate some internal node as being the most recent reference 
point of a process. The process makes new references with res- 
pect to its pointer, not with respect to the root. 

Define the k-orbit of a node v to be the set of nodes that 
are connected to v by a shortest undirected path of length k. 
The k-sphere of a node v is the set of j-orbits for l<j<k. If 
a process has its pointer at node v it will generally make its 
next reference to some node in the l-sphere of v, its second 
reference to some node in the 2-sphere of v, and its eo crefer- 


ence to some node in the k-sphere of v. Therefore the environment 
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root 


Figure 8-7. An environment graph. 


Figure 8-8. Representation of a linear sequence of words. 
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graph can be used to anticipate references: if we observe the 
pointer at node v, we can say reliably that the next k refer- 
ences will occur within the k-sphere of v. 

Consider the multilevel memory system shown in Figure 8-9. 
Suppose a node v residing in level M, is referenced, requiring 


it to be moved into level M In anticipation of future refer- 


1‘ 
ences, v’s l-orbit should be moved into level Mi» its 2-orbit 
into level Mo, and so on: when a node is moved upward k levels, 
its (n-k)-sphere is moved upward k levels. The k-orbit of any 
node v in level M, should not lie below level M,. in the memory 
system. 

Since the file directory structure used in many contemporary 
computer systems may be regarded as an environment graph whose 
nodes are segments (a node is a directory segment if and only 
if it is an internal node), a similar procedure might be used 
to anticipate segment references. Keep all of a segment on one 
level. If we observe a directory at level k is consulted, we 


bring it into level M all its contents to level Mo, etc. 


>? 
By using the environment graph information structure to- 
gether with a uniform naming scheme and highly parallel auto- 
matic memory management hardware, these goals are met: 
1. There is sufficient detail in the environment graph 
to specify a process, so that little more than a pointer 
is needed to remember where the last reference took place. 
This eliminates complex auxiliary tables needed to spec- 


ify a computation, conserves memory, and permits rapid 


inter-process switching. 
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processors 


Figure 8-9. Multilevel memory for use with environment graph. 
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Sharing is natural to implement. User A, whose local 


root node is v can share subgraph v,, of user 3B simply 


A? B 


by introducing the edge (vioVp) into the environment 
graph (with B’s permission). 

Protection is natural to implement. User A cannot 
reference any node Va for which there is no path 
(vasVp)3 and the path cannot be established with per- 
mission from user B. 

Locality is implied by the k-spheres. Given that a 
process has referenced node v, its next k references 
will generally fall within the k-sphere of v. In 
managing multilevel memories, the k-orbit of any node 
v in level M, should not lie below level My Working 
set concepts can be used to decide when a node is to 


be moved downward. 
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8.6. Summary 


Programmers and system designers should keep in mind 
cert&in guidelines, where applicable: 

1. locality. 

2. programming generality. 

3. uniform naming schemes. 

4. pooling of equipment at the finest level of detail. 

5. parallelism. 

6. ability to manipulate small quantities of information. 

The equipment configuration can be described analytically. 
Relations among program properties, processor-memory resources, 
and traverse times were derived. There is strong evidence 
favoring the use of large core storage at the upper levels of 
memory. 

In order to utilize equipment fully and to obtain the 
required capacity, it is necessary to pool small hardware units. 
If this is done successfully, it is possible to obtain many times 
the capacity with little more equipment than is currently used 
in computer systems. 

Management of multilevel memories can be handled using 
generalized working set concepts. The environment graph infor- 
mation structure provides a method for anticipating information 


references. 
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CHAPTER 9 


Performance Measures and Accounting Procedures 


9.0. Introduction 


Once we have accepted the working set model and the ideas 
of demand and balance as being valid and useful approaches to 
the resource allocation problem, the set of performance measures 
is more clearly defined. We shall review the relevant probability 
distributions and indicate how their measurement is useful, not 
only for proper regulation of the computer system, but also for 
assisting the administration in setting its operating policies. 
We shall complete the discussion begun in Chapter 1 regarding 
metering of resource usage and attributing of charges; of part- 


icular interest are methods for charging for shared information. 
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- 9.1. What to Measure and Why 


The measures fall into three classes, according to their 


purpose: 


1. 


3. 


Working set measures. The distribution of the page 
interreference intervals, the working set size distri- 
bution, and the autocorrelation function for working 
set size, are needed for the proper (program-dependent ) 
determination of the working set parameter T, and for 
better understanding of program behavior. 

System control measures. The joint demand distribution, 
the job running-time distribution, and the queue length 
distribution are needed to decide what equipment is 
needed and to arrive at a solution for the balance 
policy from the mathematical programming problem 
equations. Here, the efficiency, the missing-page 
probability, and the traverse time serve three purposes: 
first, to determine sensitivity ne thrashing; second, 
to determine the equipment configuration; and third, 

to provide additional (non-program-dependent) criteria 
for selecting the working set parameter T. Finally, 
the variation of the balance set demand (Py»m,) about 
(a,8) is useful for deciding on the choices of the 
balance parameters a and 8B. 

Policy-determining measures. The queue-length dis- 
tribution (equivalently, the distribution of unser- 
viced demands in the standby set) seavitgee indicators 
to the administration when user community demand is 
outstripping supply. The relationships among total 
community demand, bidding, and price, will have to be 


measured in order to be able to set prices. 
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We discuss each measure separately and indicate how it 


applies to each of these three categories. 


(1). Interreference Distribution Fi). (cf. Section 4.1). The 
page interreference intervals x, so intimately related to work- 
ing set properties, have appeared again and again in our dis- 
cussions. Although we defined them in virtual time, because 
virtual time renders invisible the vagaries of paging and arbit- 
rary sequencing of scheduled jobs, we can also define them in 
real time and obtain directly the real-time working set pro- 
perties. In order that we can do this with the assurance that 
the derivations are correct, we must first convince ourselves 
that page waits and scheduling interrupts are distributed uni- 
formly among the jobs. Since working set memory management 
strategies assure statistical independence among jobs, and since 
the scheduler is assumed fair, we may be assured of non-distorted 


measures. 


(2). Working set size distribution F (yu). (cf. Section 4.4). 
Measurements of individual working set sizes are needed to ob- 
tain more insights into the behavior of programs, answering ques— 
tions such as: How strong is locality across the range of pro- 
gram types?, How does w(t,T) vary across the execution of a pro- 
gram?, How successful can programmers who attempt to design 


programs with small, compact working sets be? 


(3). Correlation function R (u,t). (cf. Section 4.8). The 
correlation between working set size at two times is invaluable 


not only for examing locality, but also for assisting in the 
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proper choice of T and evaluating the predictive ability of a 


measurement of the working set size. To determine Rust) pro- 


N 


ceed as follows. Let {t 
njn 


=0 be a long sequence of N equally- 
spaced instants at which the size w(t,t) is sampled, and let 
w,=w(t, ,7) denote the size at time t,. Then the value of Ry (u,T) 


at time spacing u,; is: 


N-i 
R (u,,T7) = 1 S W, Wr ,; 
wool N-i k “kt+i 
k=0 


Two things must be noted: the number N of samples must be large 
so that i may become large enough, with N>>i, to make the sam- 
ples Wy and Dead statistically independent; and R64, 7) depends 
on tT, and so it will have to be measured for a family of tT-values. 


(4). Working Set Intersections. (cf. Section 3.1.3). A study 


of the size of the intersection between the working sets W(t,T) 
and W(t+y,T) of a certain process, as a function of y, would 
provide insight into the predictive ability of working sets. 
Also of interest is the effect of an interaction during (t,tty) 
on the intersection, as a function of the duration of the 


interaction. 


(5). The running-time distribution Ewe (cf. Section 6.2.2). 
In the case of single-process computations, this distribution 
is useful for determining processor demand. This distribution 
may not be particularly valuable in the case of multiprocess 
computations, in which we are more interested in the number, 
rather than the duration, of component processes. Moreover, 


since q is defined to be interval between successive interactions, 
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the distribution Poo) tells how often a process will be 


blocked. 


(6). Joint demand distribution RS (u,v). (cf. Section 7.4). 
Knowledge of this distribution is needed to obtain a solution 
(a balance policy) from the mathematical programming problem 
described in Sections 7.3 and 7.4. Assuming a fair balance 
policy, From (UV) is easily measured by taking samples of the 
jobs in the standby set queues. Knowledge of this distribution 
is also invaluable for assisting the administration in setting 
prices and deciding when to purchase equipment. If it is ob- 
served that either of Pr[p=1] or Pr[{m=1] is not small, then 


either price controls must be enforced to reduce demand or 


more equipment must be purchased. 


(7). Queue length distributions Bip (ylu,v). This gives the 
length n of the queue at the point (u,v) in the standby set 


demand space, Section 7.4. This is again useful for finding 
the optimum balance policy and for indicating to the adminis- 
tration when the total demand is high enough to warrant new 


equipment. 


(8). Duty factor nit). (cf. Sections 4.5 and 5.6). Defined 

as the fraction of time in the balance set a process is not in 
page wait, the duty factor is useful for determining sensitivity 
to thrashing (Section 3.6) and for determining the equipment 
configuration (Section 8.2) and for estimating processing effi- 


ciency. 
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(9). Missing-page probability A(t). (cf. Sections 4.2 and 5.4). 


Useful for determining sensitivity to thrashing, and for estim- 


ating paging rates. It can be measured in a time interval I as 


number of page faults in I 


A(T) = 
number of references in I 
(10). Traverse time T. It is useful to know how often references 


are made to Slower, lower levels of memory, for purposes of de- 


termining sensitivity to thrashing and memory system requirements. 


(11). Variation of balance set demand. For a given balance 
policy, we can perform experiments to observe the variation of 
balance set demand (Pym) about the desired (a,8). Doing this 


for a family of (a«,8) values will yield information useful for 


determination of (a,8). 


(12). Demand vs. cost curves. (cf. Section 1.4.1). The steady 
state curves discussed in Chapter 1 relating cost per unit re- 
source to total community demand would be valuable for assisting 
administration officials set prices. These curves can be com- 
posed from the joint demand distribution Pom (49 ¥) resulting from 
particular price settings. The administration may have to ex- 
periment with prices in order to determine the general character 


of the curve. 


(13). Bidding and inflation. (cf. Section 1.4.3). Assuming the 
existence of a bidding mechanism, it is necessary to know 


whether inflation is a problem. 
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9.2. Charging for Resource Use 


Given that memory management at all levels of the memory 
hierarchy is controlled by means of working set or related stra- 
tegies, we observe that there may be no need to explicitly bill 
for processor usage, because a process receives service from 
a processor if and only if its working set is in main memory. 
We merely charge an account for the size and duration of its 
main memory usage; in so doing we implicitly obtain processor 
usage. 

Thus, let wt) denote the number of pages in main memory 
at time t, belonging to process p. If wp) 6t)=0 we understand 
that p has no pages in main memory (i.e., it is neither running 
nor page wait). The cost ee. to process p during a real time 
interval I for main memory usage is 
(9.2.1) ca =. Og f wit) dt some ¢)>0 
and Cala? implies both processor and memory usage. 

We do not mean to imply that processor usage ought not be 
metered. We only mean to point out that the same mechanism 
that meters memory usage can be used to infer processor usage 
costs. 

When there is sharing, we follow the ideas of Section 5.1, 
letting a page in main memory belong to the working set of the 
process that most recently referenced it. In this case w(t) 
still measures the number of pages belonging to process p at 
time t, and the cost is still given by eq. 9.2.1. The problem 
of attributing pages to processes is an implementation problem, 


and has already been discussed in Section 5.1. 
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Eq. 9.2.1 can be extended easily to memory usage costs 
in multilevel memories, where now an owner (Section 5.1) has 
pages or svgments shored at various levels. Let we (t) denote 
the number of pages held by owner | at level Mes and suppose 
cy is the cost per unth time to store one page at level My. 
Then, during an interval! I, owner j is charged 


n 

( Cc aa = 3 j r g Y 

(9.2.2) ci(3) é i > cy wp (t) At some C570 
~ kel 


for his resource usage. 
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CHAPTER 10 


Conclusions 


By constructing abstract behavior models for ongoing 
multiprocess computations, we have intended to build a frame- 
work within which we can understand misunderstood problems, 
answer unanswered questions, and foresee unforeseen difficulties. 

Perhaps more important than the particular models is the 
basic approach. Every one of the models is based on an appro- 
priate locality concept. 

For a variety of reasons it is natural to suppose that, 
during any interval of execution, the majority of programs 
will favor a subset of their information, exhibiting locality 
in their reference patterns. 

A process’s working set of information -- the pages it has 
referenced during the last T units of execution ~- is a measure 
of the set of favored pages. Main memory allocation strategies 


that grant processors to processes if and only if their working 
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sets are present in main memory can minimize both memory 

usage costs and the possibility of thrashing. By defining a 
page’s reference density -- the fraction of the last T refer- 
ences it received -- we can refine the notion of a working set 
for memory systems having several levels of directly-addressable 
memory. Pages with the highest reference densities reside in 
the highest levels, and pages with the lowest reference den- 
sities reside in the lowest levels. 

The locality concept behind these working set models 
assumes that a process is unlikely to abruptly change either 
its favored pages or its reference densities. 

It is quite clear that resource allocation can be very 
effective if programs do in fact exhibit the locality properties 
we assume. Indeed, the more pronounced the locality behavior, 
the more successful the resource allocation. Because the con- 
cept of a working set is defined independently of a computer 
system, it is perfectly reasonable to encourage programmers to 
construct their programs to have small, compact working sets. 
There is no need to resort to absurdities, like a declare 
working set statement in PL/I; all that is necessary is that 
a programmer get organized, avoid unnecessary jumping from 
region to region in name space, and employ algorithms and data 
structures that induce highly local reference patterns. 

The definition of system demand is another application of 
locality concepts, for we assume that it is possible to measure, 
and act on, a computation’s demand before the demand can change 


significantly. 
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The definition of system demand can be extended from 
two resource types -- processor and memory -- to n resource 
types. One simply defines an n-tuple, whose eat position 
contains a measure of demand for the ay resource type. Demand 
for resource types beyond processor and memory can be defined 
once the appropriate locality concept has been recognized. 

Thus, by first aed ourselves the question: What is the 
locality concept applicable here?, we have been able to cons- 
truct useful models for program behavior. We suspect that this 
sort of approach to constructing behavior models may be useful 
in other areas as well. 

The model of a Matenndd iecneiter system has given insights 
into the causes of thrashing, into the equipment configuration 
problem, into means of satisfying other scheduling objectives 
beyond balance, and into methods of analysis. 

When the computer system is contimiously balanced, the 
demand of the balance set is tightly distributed about the de- 
sired demand. Although we cannot accurately predict the demand 
of an arbitrarily given computation, we can accurately predict 
the demand of the balance set. For this reason it is possible 
not only to avoid thrashing, but also to effect the proper 
equipment configuration and be confident that it is correctly 
matched to the work load. 

Balance policies are flexible. By formulating a mathe- 
matical programming problem whose objective function is arbit- 
rary, whose constraints enforce both balance and fairness, and 
whose solution is the set of jobs to be admitted to the balance 


set, we showed that it is possible to establish reasonable 
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policies with respect to other, arbitrarily given criteria 
(such as minimum response time). 

The model of a balanced computer system has shown that 
analysis is possible because computations can be made independent 
of one another, inasmuch as resource acquisitions of one compu- 
tation do not interfere with resources in use by another. This 
model has also shown that processor and memory demands cannot 
be treated independently: resource allocation decisions must 
account for both demands at the same time. 

The model of a balanced computer system has many applic- 
ations to contemporary and future problems of computer system 
organization. This model gives quantitative justification to 
many intuitive ideas; for example, the intuitive notion of a 
working set, or the benefits obtainable by sharing information 
and pooling equipment, or the dependence of thrashing on memory 
traverse times. This model affords possible solutions to prob- 
lems for which we have no previous answers, such as the equip- 
ment configuration problem or the thrashing problem. This 
model makes clear which program behavior parameters are impor- 
tant, and what performance measures ought to be used. The model 
suggests better system organizations, better resource allocation 
policies. The model can make system designers and administrators 
feel confident that there is theoretical justification to their 
decisions. Finally, the model has shown that we are only start- 
ing on the long road to understanding the complex behavior of 
computations and other information-processing activities. 

If we have answered some questions, we have raised others. 


Many of these have already been indicated throughout the text. 
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The most Ecoupresoré problems arise when information is 
shared. In our work here, we have made processes statistically 
independent, an assumption that is valid only if processes do 
not communicate or if shared data is not interlocked. Clearly, 
many interesting questions concern non-independent processes. 

It is evident that we want two processes to run concurrently 
whenever they share information. Ideally, we want to use 
scheduling mechanisms and policies that somehow automatically 
group processes together, according as they share information. 
More work is needed in this area. 

We defined a computation to be a collection of mutually 
cooperating processes and information operating in the same name 
space, so that a computation is behaviorally well-defined. Might 
a vaguer definition lead to even more useful models? Can we de- 
fine degrees of cooperation among processes and let the member- 
ship of a computation vary dynamically, according to degrees 
of cooperation? More work is needed in this area. 

Another direction the work can be extended is into the 
so-called distributed data problem. What locality and working 
set concepts are important when the data is geographically scat- 
tered, as might be the case in a computer network? Is there any 
way to anticipate, on the basis of present or past behavior, 
when information should be moved from one geographic location 
to another? 

We have intended to devise new approaches to modelling com- 
putations, to spark a new kind of thinking about dynamic infor- 
mation processing activities, and to develop new philosophies 
about resource sharing and allocation. We sincerely hope we 


have raised more questions than we have answered. 
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